Sunday, September 23, 2007

Yet Another Reason I Love C#

Sorry for the lack of updates recently. I took over a project in July that had several challenges and consumed my time for the past couple of months. Not that I was spending every waking hour at work, but after putting in 10-12 hour days coding and dealing with the responsibilities of being a team lead (schedules, status meetings, interfacing with the customer, and all of that other fun stuff), the thought of doing more coding for fun wasn't high on my list of priorities.

Despite the workload, it was, for the most part, an interesting project — the kind that makes me enjoy programming for a living. It was challenging and stimulating work that built on my company's existing experience developing embedded sensor networking products. But there was one challenge in particular that I could have done without.

Without going into too much detail, this is our second generation of sensor networking products and this version is still fairly young. The first generation was written in Ada and was very much tied to a specific platform and architecture. Now that there are a number of readily available and inexpensive embedded platforms capable of running a variety of operating systems, including Windows CE and even embedded Linux, our goal with the second generation was to make the product more readily portable. We also needed to keep it very extendible, since we market it to a variety of customers — multiple branches and offices within the military, as well as state and local governments, and even commercial — all of whom have slightly different needs and uses for networking sensors.

So, because of those requirements, we decided to switch from Ada to C++ and to develop a basic framework which provided the core functionality needed by all of our customers. The framework was designed to have a "common" set of code that was completely OS and hardware-independent, along with a platform-specific middleware layer. Individual projects are also intended to be completely platform-independent and merely link in the version of the framework for the platform it runs on.

All of that reflects sound software engineering principles and looks great on paper as a way to get around the many problems with writing truly portable C++ code. However, late in the project we discovered the software crashed whenever one particular type of message was sent from our sensor networking device. I took a look at it and found that in the section of code that was crashing, variables on the stack were getting changed, seemingly at random.

I immediately assumed it was a threading problem, and that another thread had a rogue pointer or perhaps a stack overrun. I asked a couple of our senior engineers to look at those possibilities, but they were unable to track it down. Just a few weeks before delivery, I was concerned we may not be able to deliver a working product. So we had a brainstorming session to think of all of the possibilities. We had proven it wasn't a stack overrun, and it was too regular and repeatable to be a thread timing problem. Because the crash itself happened inside the WinCE STL string library, there was some discussion of it being a problem with that library. However, the fact that we used STL strings throughout our code without seeing this problem anywhere else, suggested that wasn't likely.

We narrowed in on the fact that the function that was crashing seemed to be laying out the message structure in memory differently than the functions that passed in the message, which were all in different compilation units. With that information, it took me only an hour more of stepping through the code with the debugger to find the problem. The crashing function was using a definition of time_t that was only 32-bits. All of the other compilation units correctly used a 64-bit version of time_t since we don't want our product to become obsolete in 2038.

How did this happen? The most immediate reason is that one particular header file in the supposedly portable section of the framework code included a Windows header file that defined time_t as being 32-bit. OS-specific header files were supposed to be relegated to the middleware layer. That said, because of its C legacy, C++ makes it far too easy for problems like this to occur. Conditional compilation and the vagaries of determining exactly what "standard" header is being used in a particular file make it far too easy for different compilation units to use different definitions of supposedly "standard" types.

C# and Java (and Ada), have their own problems, but this kind of pulling-out-your-hair problem simply can't happen in those languages. A significant amount of experienced resources were spent tracking down a problem that shouldn't have been able to happen in the first place.

Anyway, sorry to ramble on about a non-game-programming related problem. The good news is that we had a very successful sell-off demonstration of our product last Friday and the customer is very happy. Which means my working hours will return to normal and I can hopefully get back to finishing off Snowball Fight and participating in the XNA community again.

Thursday, August 9, 2007

XNA: Ready for Prime Time?

CoDe magazine has an in-depth article on XNA titled, Microsoft XNA: Ready for Prime Time? It looks at XNA's strengths and deficiencies, and how it is being used and viewed by professional game studios and programmers. Definitely worth a read.

Friday, August 3, 2007

Congrats to DBP Winners

The DBP results are in, and the only surprise for me is that so many of the entries had been made available to play. I had expected there would be more "surprise" entries that had not been shown publicly. I also expected more pro entries so it was good to see the list dominated by small teams of hobbyists, many of whom are regulars in the XNA forums.

For those who haven't played all of the finalists, here are quick links for the seven finalists with public installers. A few of the others finalists have installers available if you go to the #xna channel on EFnet. If I get permission from the authors, I'll post links to those here as well.

Blazing Birds (Xbox)
Chaos (Windows)
Hippocrate's Dilemma (Windows)
Little Gamers (Xbox)
Magic Crystals (Xbox) (Windows )
Proximity HD (Windows)
Shuggy (Xbox) (Windows )

The following are a few other personal favorites that didn't make the finalist cut. There are a lot of other good games in the full list but if you haven't played these, you should check them out.

Battle for the Seas (Windows)
Dream Sower (Xbox) (Windows)
Headbanger (Xbox) (Windows)
In Your Face Football (Xbox)
Vacuum Ball (Xbox)

Congrats to all of the finalists, but also to everyone who managed to complete a game to enter. This has been a lot of fun for me, both as a participant and to see the results!

Friday, July 13, 2007

Platforming Sample From Creator Of Shuggy

David Johnston (aka Deajay169), the creator of Shuggy (one of the absolute best of the Dream-Build-Play entries), has created a sample project that shows how to create a basic platformer. This is very good stuff and the code is well-written and easy to understand. A must-read for anyone who wants to understand how Mario Brothers (or Shuggy!) works.

Tuesday, July 10, 2007

Pathfinding Sample Using A*

Most game developers these days know of the A* (pronounced A-star) algorithm as the preferred method of finding a path around obstacles on a map. A* is actually a general-purpose graph-search algorithm based on theories presented by Hart, Nilsson, and Raphael in a 1968 paper for the IEEE Transactions on Systems Science and Cybernetics entitled, "A formal basis for the heuristic determination of minimum cost paths."

Whew! Fortunately, although it may seem intimidating at first, the A* algorithm is quite simple and you don't have to be an IEEE fellow to understand it. My first introduction to it was in one of my AI classes in college in Nilsson's book, "Principles of Artificial Intelligence." Interestingly enough, that book doesn't even mention the algorithms use for pathfinding on a map. The example he provides uses the algorithm to traverse a search tree to solve the 8 puzzle.

However, it should be obvious (when you think about it) that pathfinding on a map is merely a specialized case of a generic graph-search. And, as it turns out, one that the A* algorithm is particularly useful for solving, provided you can determine accurate "costs" for moving between adjacent nodes on the map.


The basic idea of the A* algorithm is to divide a map into adjacent nodes. Some nodes are free of obstacles and can be entered. Other nodes can not be entered. Between any adjacent nodes, there is a cost associated with moving from one node to another.

The algorithm starts by looking at all of the nodes adjacent to the starting point. For each one, it determines an estimated total cost of moving from that node to the destination, generally based on distance without regard to possible obstacles (which aren't known yet).

It then chooses the node with the least estimated total cost and looks at all of the nodes adjacent to it. This time it tracks the actual cost of moving from the original node to the current one and then, again, adds an estimated cost to the destination to come up with a revised estimated total cost.

From all of these nodes, it again selects the node with the least estimated total cost and repeats the process. As it does so, it is continually refining the actual cost of moving into each node along various paths, always doing so by searching the lowest estimated costs first. When paths are discovered with actual costs that are worse than actual costs to the same node by other paths, they are eliminated from further consideration. The end result can be proven to be the shortest path between the start and destination nodes (assuming the heuristic for estimating total cost meets certain conditions).

For a more complete, and very readable, description, I highly recommend the article, A* Pathfinding for Beginners, by Patrick Lester. I couldn't hope to explain it in more understandable terms than he does.

An A* Implementation

I have created a sample "game" that demonstrates the implementation of A* that I used in Snowball Fight!. I created this as a separate sample application in order to make it easier to understand how the algorithm works and how to use this specific implementation. I stripped out almost all of the unnecessary code to focus as much as possible on the Pathfinding class.

In the sample, a Bug (er, insect) traverses the shortest available path around obstacles on a map towards a Flower. The Flower can be moved by the player using the keyboard or an Xbox 360 controller. Whenever the Flower is moved, the Bug uses the Pathfinder class to find the shortest path to its new location.

The sample is divided into two namespaces. The SwampLib namespace contains general purpose classes that can be reused in many games. The PathfindingSample namespace contains classes that are specific to the sample "game".


The main classes of interest in the SwampLib namespace are:

Defines an interface between a map and the Pathfinder class. This allows many map implementations to be used with the Pathfinder. The only restriction is that the map must be capable of being divided into "tiles". Note that this does not mean the map must be a traditional tilemap. In fact, the Map implementation in this sample is not a traditional tilemap.

A struct that represents a tile on an IMap.

Any object that wishes to find a path through a map must create an object of this class. The constructor to the Pathfinder object takes an IMap object as a parameter. The map associated with the Pathfinder can later be changed, if desired, with the one restriction that the map must be of the same size as the original map.

The FindPath method is used to find a path from any starting tile to any destination tile. The path found is stored as a stack of destination MapLocation waypoints.

Traversing the path is done by calling Pathfinder's Complete method to make sure that the destination has not already been reached. If Complete returns false, the GetNextWaypoint method can be called to get the MapLocation of the next waypoint.

PathNode is only of interest in understanding the A* implementation. It is not publicly accessible outside of the SwampLib namespace and is used internally by the Pathfinder class to keep track of the status of each node (tile) on the map while finding a path.

The Sprite class is not related to pathfinding, or the A* algorithm, but is provided as a reusable class for managing on-screen sprites. It provides collision detection and handling capabilities via the BoundingCircle and BoundingRectangle classes, although these are not used in this sample.

The XnaGame class is a subclass of Microsoft.Xna.Framework.Game class that is intended as a parent to other Game classes. It provides access to a number of commonly needed resources and methods.


The classes that make up the PathfindingSample itself are:

The game itself, which inherits from SwampLib.XnaGame. This class creates the GameComponents and Sprites, and loads all game content. It also manages updating and drawing the player's target Reticle.
This GameComponent manages the state of the keyboard and gamepad input controls. It is implemented as a singleton to make it easier for all of the other classes to access its properties.

The SetTarget property is true if the user has requested to change the target's position via the A button on the gamepad or the Spacebar on the keyboard.

The ToggleBlockedTiles property is true if the user has requested to toggle whether or not blocked tiles are highlighted on the map.
This DrawableGameComponent handles the majority of the game logic. It maintains a list of Sprites associated with the map (although in this case, that is only the Bug and Flower). It is responsible for updating all active Sprites and drawing the map and all visible Sprites.
The Sprite that uses the Pathfinder class to find the shortest route to the Flower's current position. Look at this class to understand how to use the Pathfinder class.
The Sprite that represents the target the Bug is seeking. Its location is set by the Reticle class when the user changes the target's position via the SetTarget property of the InputComponent class.
The Sprite that represents the player's targetting reticle. It is under the control of the user via the left thumbstick on the gamepad or the arrow keys on the keyboard. When the user selects the SetTarget input, it raises the TargetMovedEvent which notifies the Flower and Bug of the Flower's new location.
The game's implementation of the IMap interface. It is responsible for actually drawing the map background and fixed objects when directed by the MapComponent. It is also responsible for keeping track of which tiles are clear and which are blocked by fixed objects. Finally, it provides a number of helper methods that are used by the other classes that need to query the status of map tiles.


That's it! You can download the full source here:

Pathfinding Sample

I tried to make the code as readable and well-documented as possible. If you have any questions, feel free to leave a comment or email me at SwampThingTom at bayougames dot com.

Monday, July 2, 2007

Dream-Build-Play Entries

The following is a list of Dream-Build-Play entries, with links to web sites, videos, and installers (where available). I'll edit this post as I learn about more.

2D Action
Windows / Xbox

Baby Gamer - Musical Rain
Baltico X

Bathtub Navy
3D Action

Battle for the Seas
3D Turn-based Strategy

Big Sky
Screenshot 1
Screenshot 2
Screenshot 3

Blazing Birds
2D Action

Block Realm


Bullet Hell Tactics

Burning Angels
3D Shooter (3P)
Video 1
Video 2

Butterfly Paradise
3D puzzle
Screenshot 1
Screenshot 2
Screenshot 3

2D Action

3D Puzzle

Crystal Glyder
3D Flyer

3D Sci-Fi/Action Multiplayer Arcade

The Dishwasher: Dead Samarai
2D Brawler

Dog Food
2D Platformer

2D Shooter

Dream Sower
Kyle0654, LoneRockstar, CatalinZima, atlStylin
3D Action
Screenshot 1
Screenshot 2
Screenshot 3


Evil Aliens

2.5D Shooter
Video 1
Video 2
Video 3

The Farthest Land
2D Fighter

Galactic War
3D Action

Ghost Ball
Da Coder
3D Sports
Screenshot 1
Screehshot 2

Give it a Rest
Camp ELM
2D Side Scroller

Gravitron Ultra
Screenshot 1
Screenshot 2
Screenshot 3

2D Multiplayer Shooter

Hasta La Muerte
Screenshot 1
Screenshot 2
Screenshot 3

Head Banger
Ganksoft Entertainment

Hippocrates Dilemma
2D Shooter

3D Action

In Your Face Football
barkers crest
2.5D Arcade Sports


Little Gamers: Teh Game
2D Action

Magic Crystals
2.5D Action Puzzler

Mech Rider Z
3D Mech Shooter

3D Action Adventure

2D Puzzle

Mr. Pluck Adventures
3D Platformer

2.5D Action

The Night of the Puppets
2.5D Action Adventure?

2D Shooter

Proximity HD
2D Strategy / Board Game

Screenshot 1
Screenshot 2
Screenshot 3

Screenshot 1
Screenshot 2
Screenshot 3

The Real Invaders
3D Action

2.5D Party
Screenshot 1
Screenshot 2
Screenshot 3

Roto Motion
2D Puzzle

Samarai Soul Hunters
2.5D Brawler

2D Platformer

Sky Burner
3D Action Flying

2D Action
Screenshot 1
Screenshot 2

The Snoobies
2D Action
Screenshot 1
Screenshot 2
Screenshot 3

Snowball Fight!
2D Party

Solar Defense
2D Shooter
Screenshot 1
Screenshot 2
Screenshot 3

Space Sudoku

Sprockets of Strife
Screenshot 1
Screenshot 2
Screenshot 3


Tower Defense
2D Action Arcade

Truck Defender X
3D Action / Strategy

Screenshot 1
Screenshot 2
Screenshot 3

Vocab Trainer

Vacuum Ball
2.5D Action Puzzler

2.5D Shooter

X Space Cruiser
3D Space Racing
Screenshot 1
Screenshot 2
Screenshot 3

3D Space Shooter

Xtreme Table Soccer

Yo Ho Kablammo!
Screenshot 1
Screenshot 2
Screenshot 3


Uploaded the latest build of Snowball Fight! to the Dream-Build-Play website this morning. There's plenty more I plan to do with the game but it's nice to have this milestone passed. Thanks to everyone who provided feedback on the previous release. And, of course, thanks to Microsoft's XNA Team for providing an excellent product with amazing support (for free!) and to the XNA Community for providing so much help and information.

You can download the latest Windows release at and the latest Xbox 360 release at

Wednesday, June 20, 2007

Snowball Fight!

I'm happy enough with the progress on my Dream-Build-Play entry that I've decided to put a beta out for public release. Snowball Fight is a 1-4 player party game where you try to knock your opponents out by hitting them with snowballs before they knock you out. Of course, you can only hold so many snowballs at a time, so don't wander too far from your snow pile when you are getting low. Also, keep an eye out for random power ups for temporary bonuses.

I've also created a new website / blog for distributing my games — Bayou Games. "Advice from the Swamp" will continue to focus on videogame programming, while will focus on providing playable games to gamers.

Snowball Fight

I'm afraid that, for now, I've decided not to release the source. (But I also didn't obfuscate it, so feel free to Reflect at will.) I'm trying to determine the best way to share the source with the XNA community while keeping my options open for publishing the game in some form. My preference is to turn the source into an on-going tutorial on writing a complete game — in the same vein as Reimer's tutorials, but with a truly complete game as the final product. But that clearly takes a serious commitment and, as I said, I'm keeping my options open for now.

However, since there is clearly interest in an article on the A* pathfinding algorithm, I will be providing that soon. Again, I'm not certain exactly what form it will take, but it will be available before the end of the month.

Tuesday, June 5, 2007

What Next... Again

Thanks for the comments. It looks like there is some interest in an article on implementing the A* pathfinding algorithm. I have an implementation in my Dream-Build-Play game, and I'm trying to figure out the best way to turn that into one or more articles.

The biggest problem is that spending time writing articles and putting together these blog entries takes significant time that otherwise could be used improving my DBP game. With the DBP deadline a mere month away, I'm trying to decide how best to balance that.

I also have plans for an article to demonstrate how to write game control input code that is completely independent of the actual input device. I have a series of classes (also written for my DBP game) that lets the user select pretty much any input device and customize the controls for that device in such a way that the game itself knows nothing about the actual input device being used. It consists of a series of adapter classes that can convert any digital input to an analog input and vice versa, so that the user can configure specific game controls to use arrow keys on a keyboard, the Dpad of a game controller, or an analog joystick without any changes or special treatment by the game code.

Sunday, June 3, 2007

A Generic Pool Collection Class

I originally intended to write this article for Ziggy's XNA Article Contest. Unfortunately, real life got in the way in the form of a few weeks of crunch-time at work followed by a much-needed (and fabulous) vacation in Paris. Between those things and spending the little free time I had last month on my Dream-Build-Play entry (and not as much time on that as I'd have liked), I didn't get the article finished in time for his deadline. He received some great entries, though, so if you haven't already, head over there to read them. My favorites are the two HLSL articles by Jamezilla, one on Bloom/Blur Post-Processing and the other on Shockwave Distortion; and the article on Bezier Curves by Cubed2D. But all of the entries are worth a read.

EDIT: It looks like just as I was posting this entry, Ziggy was announcing the contest winner. Congrats to KrisC for his article on Graphic User Interfaces in XNA!

Resource Pools

Games need to be able to maintain collections of various types of resources, including such things as projectiles, particles, enemies, etc. Most non-game .NET applications would simply allocate such resources dynamically and then let the garbage collector dispose of them when they are no longer needed. Unfortunately, dynamic memory allocation and deallocation can, and will, cause slowdowns at indeterminable times. For games, or any real-time software, this is obviously unacceptable. As a game programmer, you need to be in control of when resource allocation and deallocation occurs and how long it takes.

The solution to this is to create resource pools where all of the memory for each resource type is allocated up-front during the game's initialization. When a resource is needed, an available item can be removed from that resource type's pool. When the resource is no longer needed, it can be returned to the pool.

Because a resource pool is a group of objects of the same type that has a fixed size, they are almost always implemented as an array. However a resource pool also has to keep track of which objects are currently being used, and there are multiple ways to do that. One way is to simply add a field in the resource class that specifies whether that particular object is active. The problem with that approach is that you have to rewrite code to manage each particular resource's pool. That is, you'll need separate code to manage your projectile pool and your particle pool, even though the code is doing basically the same thing. You could get around this "block copy" approach by having a parent class that maintains the active flag and contains the methods used to get a free object from the pool, update the active objects in the pool, reset the pool, etc. But that requires that all of your resource classes inherit from that parent, which isn't always desirable (and sometimes may not even be possible).

The most flexible approach is to have a generic Pool class, similar to the other .NET generic collections, that handles the maintenance of resource pools and can hold any type of object. In this article, we'll look at an efficient implementation of a Pool class and how to use it for a variety of different resource types.

Along the way, I'll describe some of the design decisions and trade-offs that go into writing generic classes. However, this is not intended to be a tutorial on writing C# generics. If that's what you are looking for, I recommend first reading a good book or online tutorial that covers the topic.

Desired Features of a Generic Pool

The following features are required for our Pool class to be useful.

Get — Takes a free item from the pool and marks it as active.

Return — Returns an active item to the pool and marks it as available.

Clear — Returns all active items to the pool and makes them all available.

Capacity — Gets the total number of items in the pool, both available and active.

AvailableCount — Gets the total number of available items in the pool. This is necessary to ensure the user doesn't attempt to Get more than the available number of objects. Therefore any call to Get should be preceeded by a test to ensure there are objects available.

ActiveCount — Gets the total number of active items in the pool.

CopyTo — Copies all of the active objects in the pool to a one-dimensional array. This is needed so that a resource pool can hold vertex data which can be easily copied to an array for passing to the GPU.

GetEnumerator — Gets an enumerator that can iterate over the pool's active items.

AllNodes — Gets an enumerator that can iterate over all of the items in the pool, active and inactive. This can be useful if there is one-time initialization that needs to be performed on all of the resource objects in a pool before they are ever used. Generally, however, after initialization, users should only be accessing active objects.

efficiency — Because resources such as particles and projectiles are generally being allocated and returned many times per frame, the pool class must be as efficient as possible. Where possible, all methods should be constant time — O(1) — and none should be worse than linear — O(n).

encapsulation — The implementation details should be hidden from the user as much as possible. In particular, the user should not have direct access to the active flag which should be modified only by the Pool class itself.

ease-of-use — The class should put as few restrictions on the user as possible so that it can be used as easily as other collection classes. In particular, it should have very little restriction on the types of objects that it can hold.


To meet the efficiency requirements, the Pool class is implemented as an array along with a Queue containing the available (non-active) objects. By maintaining the list of available objects as a queue, the Get and Return methods can be O(1).

To meet the encapsulation requirements, the pool array is not an array of the objects being held by the pool, but an array of a Node struct. Each node contains the item being stored in the Pool along with the index of that item in the array. Because the node has to be given to the user so that it can be returned to the pool when the user is finished with it, the active flag is not stored in the node itself. If it were, the user would have direct access to it and could change it themselves, which we want to avoid. Instead, there is a separate array of booleans whose elements correspond to the elements in the pool array. Thus the NodeIndex field in the Node struct provides the index into both the pool array and the active array.

Finally, the ease-of-use requirements are mostly met by virtue of the class being implemented as a generic. As mentioned earlier, this removes the limitation of having a parent class for all resources that are to be stored in pools. However, since we are storing the actual resource objects inside Node structs, we need to be able to allocate each one when we create the nodes in the array. This means we have to limit resources to types that have a public, parameterless constructor. In other words, you can't store basic data types such as ints, floats, etc., inside a Pool. Since this Pool class is intended for holding resources that are more complicated than the basic data types, that restriction shouldn't be a problem.

The use of nodes when getting and returning pool elements is another design trade-off that slightly limits ease-of-use, in that the user has to deal with another data structure besides the one they want to store in the pool. To remove the slight burden this presents, the iterator returned by the GetEnumerator method provides direct access to the resource objects rather than nodes. Another enumerator is provided by the ActiveNodes property which can be used to iterate through nodes. This gives the user the flexibility to access active resource objects directly, or to access nodes for times when they may need to return an item to the pool. In general, I'd recommend using the ActiveNodes property inside an Update method, where the update may cause the resource to be finished and need to be returned to the pool; and the GetEnumerator property inside a Draw method, where you just want to go through all of the active resources and draw them.

The other disadvantage of storing the resource objects inside Node structs in this way, is that for value types (structs) stored in a pool, the user will only ever get a copy of the object instead of having direct access to it. That means that when using a Pool of a struct type, setting the node's Item field doesn't change the value of the item in the pool. To get around this problem, the SetItemValue method is provided to update the value of a given node in the pool.

Using the Pool Collection

Hopefully the comments provided with the class are fairly self-explanatory. The following subsections provide an overview of how to use the class.


To use a Pool collection, all you have to do is construct it with the number of elements to be stored, generally at program initialization. This number of elements in the pool should be more than you will likely need, but not many more. It may require some tweaking to get that number just right for each specific use but generally you should have a pretty good idea of how many elements you need based on the design of your game and how many of each projectile, particle, etc. are likely to be active.

class Missile { private Vector2 position; private Vector2 velocity; private int configValue; public Missile() { } public void Init(int configValue) { this.configValue = configValue; } public void Fire(Vector2 position, Vector2 heading) { this.position = position; this.heading = heading; } public bool Update() { // Move missile, etc. ... // See if missile is finished if (reachedMaxDistance) { return false; } return true; } public bool Collided(Rectangle otherObject) { if (otherObject.Intersect(position)) { return true; } return false; } public void Draw() { // Draw missile ... } } public Pool missilePool; public void Init() { missilePool = new Pool<Missile>(MaxMissiles); // If there is any one-time initialization of all // resources, it can be done here. Generally, // though, this shouldn't be needed. foreach (Pool<Missile>.Node missile in missilePool.AllNodes) { missile.Init(ConfigValue); } }


During the update portion of each frame, you will get items from the pool if needed, update all active items, and return items that are no longer needed.

public void Update() { // See if the user has fired a missile // (and verify there are some left in the pool) if (MissileFired && missilePool.AvailableCount > 0) { // Get a missile from the pool. Missile missile = missilePool.Get().Item; // Initialize the missile's position and velocity. missile.Fire(player.Position, player.Heading); } foreach (Pool<Missile>.Node node in missilePool.ActiveNodes) { if (!node.Item.Update()) { // If something happened during its Update to // make the missile no longer active, return // it to the pool. missilePool.Return(node); } else { // Note that if Missile was a struct instead // of a class, you would have to do the // following here: // missilePool.SetItemValue(node); // Loop through objects it might hit. foreach (GameObject item in ActiveObjects) { if (node.Item.Collided( item.BoundingRectangle)) { missilePool.Return(node); } } } } }


During the draw portion of each frame, you will get items from the pool if needed, update all active items, and return items that are no longer needed.

public void Draw() { // Draw each active missile foreach (Missile missile in missilePool) { missile.Draw(); } }

XNA Creator's Club Particle Sample

To provide a more complete example of using the Pool collection, I've modified Microsoft's XNA Creator's Club Particle Sample. The only changes from the original are the inclusion of the Pool class itself, in Pool.cs; and modifications to the ParticleSystem class, in ParticleSystem.cs. All changes have been marked with conditional compilation instructions (#if USE_POOL_CLASS) to make it easy to compare the original code with the new code.

Pool Collection Class

You can download the source file here.

Wednesday, April 18, 2007

Software Efficiency And Optimization (Part 3)

In the first two parts of this series, I discussed the difference between optimizing a design versus optimizing implementation, and showed how to use prototyping to benchmark key algorithms. This part will look at how to use a profiler to find performance bottlenecks.

The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.

So when do you do it? Only when you know you have a problem, and only when you know what the problem is. Profilers help you find these problems.

Argghhh! Why is my game so slow?

So you've finally gotten your game to the point that it is basically playable with most of the features you want implemented, and now you are hoping to get some friends to help you start beta testing it. But when you run it, the animation is incredibly jerky. Because you had been bragging to your friends about the game and want them to see it as soon as possible, you start going through the code in your mind, frantically thinking about what could be causing the slowdowns.

"I've heard foreach is very slow," you think, so you quickly do a search through your files and replace every one of your foreach statements with equivalent for statements, kicking yourself for not having done this in the first place. After fifteen minutes or so, and several recompiles because you made some mistake or another while translating the code — "Whoops, I declared my index variable to be 'index', but I called it 'i' when I tried to increment it." ... a couple minutes later ... "Dammit, that array only has ten elements but I told it to loop through eleven." — you finally get it running again... only to be dismayed to discover that it made no difference.

"Hmmm... I have an awful lot of matrices I'm passing around by value. Copying all of that data at every function call must take an awful lot of time." So you go through all of the code again looking for every method that takes a matrix and changing them to pass by reference. After many, many more recompiles than were needed during the change from for to foreach"Oh, come on, you mean I have to add the ref keyword both when I declare the method and when I call it?!?" — you finally get it back to the point where you can run it again. Unfortunately, now it doesn't work at all. You run it and objects start jumping around the screen for no apparent reason. Then you remember that some of your methods took advantage of the fact that a matrix was being passed by value and changed its value locally. Now that it is being passed by reference, that change is being made in the original object, which is not at all what you intended.

If you are lucky (or smart), you stop there and take a step back. If you're not lucky (or you're just foolish), you think that you'll be able to fix it and you start going into each method trying to figure out how to kludge your way around the problem on a case-by-case basis. Maybe you'll even succeed and get the game back to its runnable state that you had it in hours earlier. Maybe you're even lucky enough to have not introduced yet more bugs in the process. But in all likelihood, you haven't fixed the problem. And now you have code that does not run any better than when you started, but is harder to understand, harder to debug, and has kludges in it to get around problems that you introduced trying to fix the wrong problem.

Your time will be much better spent finding out where your problems are before you try to fix them.

Using NProf

Execution profilers are tools that help you find where your code is spending most of its time. There are two types of execution profilers. Event-based profilers use actual events, such as function calls, and trace the exact execution path as the program runs. Statistical profilers work by interrupting the program at frequent intervals to determine what section of code is currently running. Statistical profilers are less intrusive and therefore provide more accurate measure of how much time is spent in each function.

There are some very nice professional statistical profilers available for .NET development, including the ANTS Profiler and dotTrace. Both provide lots of great features, but both cost hundreds of dollars. Since one of the great aspects of XNA development is that Microsoft has made the Game Studio Express tools available for free, I'll show you how to keep using free tools by concentrating on the Open Source NProf profiler.

As with many (most?) Open Source tools, the main things missing from NProf are documentation and support. Once you get past that, though, it's a great tool. It doesn't provide the cool graphs and other bells-and-whistles of its $300+ commercial counterparts, but it lets us find out how much time is being spent in each of our functions. Ultimately, that's all we care about.

You can download the latest version of NProf (0.10 as of this writing) here. When you run it, you get the following (rather unhelpful) screen:

You might think that you would want to start by selecting File->New, which is the convention for Windows programs. Sadly, that does nothing. (Actually, it clears the current application so you can load a new one. But since you haven't loaded one at all yet, doing this now doesn't change the screen or give you any feedback.)

To select an application for profiling, press the Browse button. Then find a .NET executable for debugging. Note that when profiling XNA games, you will have to select a Windows executable — currently there is no mechanism for profiling Xbox 360 games.

Your XNA game is unlikely to take command line arguments, but on the off-chance that you have written it to support them, you can enter them in the Arguments text box.

Once you have selected a program to profile and entered any command line arguments, select Project->Start or press F5 to start the program. Your program will start running. Play through your game and then exit the program. When you exit, the NProf window will look something like this:

This is a sample run from my Space Invasion game. The two views shown in the top and bottom panels list every method that was executed when you played the game. The top view lets you expand each method to see what methods it called. The bottom view lets you expand each method to see what methods called it. In both views, the Time column shows what percentage of the program's total execution time was spent in each method. The methods are sorted by the percentage of the program's execution time that was spent inside that method.

Note that that includes the cumulative time spent inside every method that method called. So any time you run NProf on an XNA game, the first method listed will be System.Threading.ExecutionContext.Run and it will be shown as taking 100% of the time. That's because everything else in the game is ultimately called by that method, so every time NProf took an execution sample, it was in System.Threading.ExecutionContext.Run.

Looking down a bit further, you'll notice your main program (in this case, SpaceInvasion.Program.Main) and below that the XNA Framework's Run method (Microsoft.Xna.Framework.Game.Run). Continuing down even further, you'll finally find Microsoft.Xna.Framework.Game.Tick. In a fixed-step game, this will be called every TargetElapsedTime period (defaulting to 60 times per second).

If you expand that method in the Callees panel, you'll see that it calls three methods, Microsoft.Xna.Framework.Game.DrawFrame, SpaceInvasion.InvadersGame.Update, and Microsoft.Xna.Framework.GameClock.Step. The latter took only 0.01% of the total execution time, so we can safely ignore it. The SpaceInvasion.InvadersGame.Update method took only about 2% of the total execution time. And Microsoft.Xna.Framework.Game.DrawFrame took over 90% of the execution time!

Does that mean the program took over 15 milliseconds to draw each frame for a simple Space Invaders game with no more than 60 2D objects on screen at any time? Not at all. The time spent in the DrawFrame method includes the time spent waiting to synchronize with the vertical trace on the GraphicsDeviceManager.

For now, don't worry too much about how much time is spent resolving the back buffer to the GPU. (I'll talk very briefly about GPU issues at the end.) Instead, look at how much time is spent inside your program's Update and Draw methods. In this particular run of Space Invasion, these are approximately 3.4% for SpaceInvasion.InvadersGame.Update and 1.6% for SpaceInvasion.InvadersGame.Draw.

Since we know that Space Invasion is running at the default fixed-step time period of 60 frames per second, then the amount of time spent in Microsoft.Xna.Framework.Game.Tick each frame is 16.7 milliseconds. So we might be tempted to try to calculate the amount of time spent in each method by multiplying the percentage of time spent in each by 16.7.

However, this isn't accurate because, as we noted above, only 93.3% of the program's execution time was spent in the Microsoft.Xna.Framework.Game.Tick method. And, of course, I've already mentioned that even statistical profilers like NProf add some amount of overhead and error. While you could get out your calculator and start dividing the time spent in the Update and Draw methods by 0.933, there is yet another catch. The time shown is the total time for the entire program, including the time spent on the title screen when we weren't actually playing the game.

Of course, there are still ways around this if you are determined to get exact numbers for the average time spent in each method for each frame. But it's really not worth it. Don't make the mistake of thinking you can use a profiler to find out exactly how much time each method takes, or even that it's helpful to do so. Relative time is what matters.

Even looking at relative time, though, we probably don't care about how much time was spent updating and drawing the title screen. We want to know how much time was spent moving our sprites and performing collision detection while playing the game. Fortunately, Space Invasion is set up so that each screen is handled by a different class. I highly recommend you do this in your games too. I've seen some games where the Game's Update method is a huge switch statement, or, even worse, a set of if/then/else statements that determine which game screen is currently displayed. The whole point of object oriented programming is to allow the creation of specialized classes that handle specific functions. Providing a separate class to handle each screen's Update and Draw methods will make your programs easier to understand, easier to debug, and, as we're about to see, easier to profile.

Since Update is (not surprisingly) where we are spending the majority of our time, we'll drill down into it to see how it's spending that time. Expanding the SpaceInvasion.PlayScreen.UpdateGame method, we see the following:

Perhaps not surprisingly, when playing the game, the majority of the time is spent in our inefficient collision detection code. But, as we can see, the percentage of time spent in any of the Space Invasion methods is insignificant compared to the time spent in Microsoft.Xna.Framework.Game.Tick as a whole. Not exactly a shock since the game clearly runs quite smoothly, with no animation or control glitches. But it may be surprising to see just how little time is spent even in the collision detection code, which we knew in advance was the most inefficient algorithm possible. I was certainly surprised.

The important thing here is to see how to use this data to find where your performance problems are when you do have them. If the game was running slowly, knowing that over 75% of SpaceInvasion.PlayScreen.UpdateGame's time was being spent in the HandleCollisions method would tell us immediately where to focus our optimization time. (For those wondering where I got 75%, it's the amount of time spent in HandleCollisions as a percentage of the amount of time spent in UpdateGame.)

What it wouldn't tell us, however, is how best to optimize that code. This is where going back to the lessons in part 2 of this series is helpful. You might be able to optimize the code by making some simple implementation optimizations, such as passing large value objects by reference. As long as such changes are localized to small sections of code, and documented with good comments so you can remember later why you wrote the code that way, this is perfectly acceptable.

However, if you are way out of the bag and need significant code changes that may lead to buggy and hard-to-maintain code, you are better off going back to the drawing board and implementing a completely new algorithm. The only way to know for sure is to go back to prototyping and benchmarking some alternatives, in a small test program that doesn't require changing your main game code until you know what those changes need to be.

Other Potential Performance Bottlenecks

Execution profilers are great for helping find CPU bottlenecks, but there are other potential sources of performance problems. In particular, inefficient use of the GPU and/or memory can cause problems as well. If you are having performance issues but profiling shows that you are not significant amounts of time in your Update and Draw methods, then in all likelihood the GPU or memory is to blame.

GPU issues are caused either by making too many draw calls to the GPU or by sending it too much data each frame. Modern 3D graphics cards are pipelined so they are most efficient when they can perform many operations on a single source of data (i.e., a single texture). Therefore, minimizing the number of times textures are transferred to the GPU and minimizing the number of actual GPU draw calls go a long ways towards making the most efficient use of the GPU. The best tool for investigating GPU issues is PIX, part of the DirectX SDK from Microsoft.

Memory issues are mostly caused by not understanding how the CLR allocates memory and performs garbage collection. (Memory issues can also be caused by simply having too much data in memory at a given time, but most computers have at least 512MB if not a full gigabyte of memory these days, so that's less of an issue than it once was. Of course, that doesn't mean that you should feel free to load 500+MB of game data and expect to be able to keep it all in memory at one time...)

The important thing to keep in mind is that in C#, reference types, meaning all actual classes, are dynamically allocated on the heap. They will remain there until they are no longer referenced by any other object, at which point they become eligible for removal by the garbage collector. Value types, however, which include all of the basic types (int, float, bool, etc.) as well as structs, go on the stack and are removed when they go out of scope. Therefore, they never cause garbage collection issues (unless they are boxed). A key problem for new C# programmers, especially those that come from a C or C++, background, is that the new keyword in C# does not mean "allocate space for a new object on the heap", the way it does in those other languages. It just means "call this object's constructor". So, for example, the statement

Rectangle rect = new Rectangle(0, 0, 100, 200);

does not allocate an object on the heap, because Rectangle is a struct and, therefore, a value type. Variables of type Rectangle will never cause garbage collection issues, nor will any other struct types (again, unless they become boxed), despite the fact that to a C++ programmer the above syntax looks like you just dynamically allocated memory.

Another, even worse, common garbage collection confusion is that certain language constructs can create objects on the heap under the covers. This happens when value types become boxed, which happens whenever you try to use a value type where a reference type is expected. In these cases, the value type has to be "boxed" inside a reference type.

It also happens when the foreach statement creates an iterator for a reference type. This is the reason the foreach statement gets such bad press, because it can create iterators on the heap that will need to be garbage collected without it being obvious from the code that it is doing so (at least, to a programmer unaware of this issue). Using foreach with value types causes no such problems.

Memory and garbage collection issues are best discovered through the use of the CLR Profiler and the Remote Performance Monitor for the Xbox 360. The .NET Compact Framework team put together a great article on garbage collection in the Compact Framework and how to use the Remote Performance Monitor. Eli Tayrien's Cornflower Blue blog has two good articles on using the CLR Profiler and the Remote Performance Monitor tool to dispel the myth that foreach is evil.

The Bottom Line

Part 2 of this series showed how to use benchmarking to find out how long it takes to run a particular section of code. This article showed how to use a profiler to determine what percentage of time is being spent in each method during a full run of a complete program. By combining those two techniques, you can get a good handle on what algorithms are most appropriate while you are designing your game, and what sections of your code (if any) should be considered for optimization once you have written the game.

Hopefully what you have taken away from this series is 1) understanding the importance of optimizing your design before optimizing your implementation; 2) understanding the different types of performance problems that videogames can run into; and 3) understanding how to use various techniques to help you find your performance problems so you spend your time solving the right problem.

What Next

I'm trying to decide where to take this blog from here. I could continue focusing on software engineering as it applies to games. Or I could start taking in-depth looks at particular game algorithms and how to implement them for XNA — for example, I could show how I used the minimax algorithm with alpha-beta pruning for finding the best move in my Othello game, and how to implement it in XNA; or show how to implement the A* algorithm for pathfinding. Or I could start showcasing my Dream-Build-Play entry and how various parts are implemented. If you have any thoughts or preferences, please leave a comment.

Sunday, March 25, 2007

Software Efficiency And Optimization (Part 2)

In Part 1 of this series, I discussed the importance of understanding game design tradeoffs and introduced the concept of Big O notation for comparing the efficiency of different algorithms. However, as I said, Big O notation tells us nothing about how fast an algorithm will actually run, it just tells us how efficiently it scales when operating on greater numbers of objects.

"That's great", I hear you saying, "but how do I know when I'm doing my design whether a particular algorithm will meet my needs?"

Ah, well, there is really only one way — experience.

"Gee, thanks,", you reply. "How am I supposed to get the experience I need to design my game, if you say I shouldn't write it until after I've designed it?!?"

That's actually pretty simple. You need to prototype your approach and see how well it meets your requirements. If it doesn't meet your requirements, you need to either prototype different approaches until you find one that does, or scale back your requirements.

Prototyping and Benchmarking

A prototype is just a simplified version of software (in this case, your game), that focuses on one particular aspect of it. Prototypes are often used as proofs of concept to show that the software will be able to work as expected. Because prototypes don't have to deal with all of the details that the final software will, they can be written quickly so that, if necessary, different approaches can be evaluated.

Prototypes are frequently used to evaluate user interfaces so that customers can provide early feedback. That can be useful for game programming too, since if you can implement a working display and control scheme you may be able to find out what works and doesn't work before you get too far along in the actual implementation of the game. However, the use of prototypes that we are concerned with here is to determine whether an algorithm is fast enough for the game we want to write. To do that, we will want to benchmark it. Benchmarking is just the process of timing how long an algorithm takes to run.

Fortunately, the .NET Framework makes benchmarking very easy by providing the System.Debug.Stopwatch class.The Stopwatch class provides a Start and a Stop method. It keeps track of the total number of clock ticks that occur between calls to Start and Stop. Even better, like a real stopwatch, it keeps a running count of ticks between successive calls to Start and Stop. You can find out how much time has passed by querying its ElapsedTicks or ElapsedMilliseconds properties. A Reset method lets us reset the Stopwatch back to zero.

Lets put this to use in a simple example that merely tests how long it takes to move a sprite. For this, I'll use a simplified version of the Sprite class I wrote for my Space Invasion game. A quick overview of the class is provided below. You can download the full code for this benchmark example here.

public abstract class Sprite { public Vector2 Position { get; set; } public Color Color { get; set; } /// /// Sprite's collision rectangle in screen coordinates. /// public BoundingRectangle BoundingBox { get; } public Sprite( string imageName, BoundingRectangle boundingBox); public virtual void Initialize(); public virtual void LoadGraphicsContent( ContentManager content); public virtual void Update(GameTime time); public virtual void Draw(SpriteBatch spriteBatch); /// /// Tests for collision with another Sprite. If a /// collision occurs, it calls the Collide method for /// both Sprites. /// Returns true if images collide. /// public bool TestCollision(Sprite item); /// /// Called when the TestCollision method detects a /// collision with another Sprite. /// protected virtual void Collide( BoundingRectangle overlap, Sprite item); }

As you can see, it provides methods corresponding to all of the usual game engine methods — Initialize, LoadGraphicsContent, Update, and Draw. Additionally, it provides properties for getting and setting the position and color. For collision detection, it provides a TestCollision method that tests for collision with another Sprite and a virtual method, Collide, that gets called by TestCollision for both sprites if their BoundingBox values intersect.

Since it is abstract, it is intended to be used as a parent to other Sprite classes that will implement its behavior, so we will create our own TestSprite class. The TestSprite will generate a random starting position, directional movement vector, and speed (in pixels per second), as shown here:

public override void Initialize() { // Set starting position. Position = new Vector2( random.Next(screenWidth), random.Next(screenHeight)); // Create a random movement vector. direction.X = (float)random.NextDouble() * 2 - 1; direction.Y = (float)random.NextDouble() * 2 - 1; direction.Normalize(); // Determine random speed in pixels per second. speed = (float)random.NextDouble() * 300 + 150; }

Each frame, it will update its position based on its movement direction, speed, and the amount of time that has elapsed. It will also test to see if it has hit the edge of the screen, and deflect away from it:

public override void Update(GameTime time) { // Reset color back to white. Color = Microsoft.Xna.Framework.Graphics.Color.White; // Calculate movement vector. Vector2 move = (float)time.ElapsedGameTime.TotalSeconds * speed * direction; // Determine new position. UpdatePosition(move); } private void UpdatePosition(Vector2 move) { Position += move; if ((BoundingBox.Left < 0) || (BoundingBox.Right > screenWidth)) { direction.X = -direction.X; Position -= new Vector2(move.X, 0); } if ((BoundingBox.Top < 0) || (BoundingBox.Bottom > screenHeight)) { direction.Y = -direction.Y; Position -= new Vector2(0, move.Y); } }

Note that the screen collision test shown here is quite simple. An actual game may want to determine the actual point of intersection so it could deflect away from that point more realistically. If you did need that level of realism, you would probably want to go ahead and implement your strategy here so you could time it. However, all we are trying to prototype here is basic update time so this version is fine for our needs.

Note that the Update method does not test for collisions. We don't want individual sprites testing for collisions because to do so, our Sprite class would have to know about other game objects and we would be severely limiting our design options for collision testing. Any changes to our collision testing algorithm could, and likely would, impact our Sprite class. We want to avoid anything that limits future design changes, so we'll give our Sprite class the ability to test for collisions but require another part of our code to determine what objects should be tested.

We'll talk more about collision testing next. For now, let's see what it takes to time just moving our TestSprite around the screen. Inside our game, we'll create a TestSprite object and call its Initialize and LoadGraphicsContent methods at the appropriate places. And we'll create a SpriteBatch for our game and pass it to our TestSprite's Draw method inside our own Draw method. Now all we need is for our Update method to call our TestSprite's Update method each frame and use a Stopwatch to time it. To do that, we'll create a couple of helper methods that start and stop the Stopwatch, and have it print out the amount of time it takes for each update:

private Stopwatch updateTimer; private int updates = 0; private int framesPerSecond; private void StartTimer() { updateTimer.Start(); } private void StopTimer() { updateTimer.Stop(); updates++; // Show the results every five seconds. if (updates == 5 * framesPerSecond) { Debug.WriteLine( updates + " updates took " + updateTimer.ElapsedTicks + " ticks (" + updateTimer.ElapsedMilliseconds + " milliseconds)."); int msPerUpdate = (int)updateTimer.ElapsedMilliseconds / updates; Debug.WriteLine( "Each update took " + msPerUpdate + " milliseconds."); // Reset stopwatch. updates = 0; updateTimer.Reset(); } }

By putting calls to StartTimer and StopTimer around the calls to our sprite's Update method, we'll get a report of the average time each call takes.

300 updates took 34931 ticks (9 milliseconds). Each update took 0.03 milliseconds. 300 updates took 24445 ticks (6 milliseconds). Each update took 0.02 milliseconds. 300 updates took 23541 ticks (6 milliseconds). Each update took 0.02 milliseconds. 300 updates took 23583 ticks (6 milliseconds). Each update took 0.02 milliseconds. 300 updates took 23963 ticks (6 milliseconds). Each update took 0.02 milliseconds.

Not bad, each call took on average of 20 microseconds (on my development laptop — your results will vary). But notice that the very first set of updates took almost one and half times as long to run as the others. That's because the first time these methods are called, the JIT compiler is compiling the code and our Stopwatch is timing that as well. It's also possible, since this is a fairly small amount of code that is being called repeatedly, that some or all of it may be fitting in the cache, which will increase the speed of later calls.

These show some of the problems with benchmarking code. Another problem is that we are adding some time by using the Stopwatch itself. Thus benchmark times for prototype code can be used as a general guide, but can not be relied upon for exact values. In fact, exact values of the time it takes for functions to run are very hard to determine. Although intended only to describe quantum phenomenon, a variation of the Heisenberg Uncertainty Principle is at play here — the act of measuring something affects the thing being measured. We'll explore this more, and ways to work around it, in Part 3.

Benchmarking Collision Handling

Now let's expand our prototype to help us determine whether we can get away with a brute-force approach to collision detection. You can download the updated project for this section here.

First, let's look at the collision handling code that I already placed in TestSprite's Collide method. Remember that this gets called, for both sprites, whenever TestCollision determines that a collision between two sprites occurred. All it does is set the sprite's color to red.

protected override void Collide( BoundingRectangle overlap, Sprite item) { // Turn the sprite red to indicate collision. Color = Microsoft.Xna.Framework.Graphics.Color.Red; }

Let's give this a test by replacing our single TestSprite with an array of TestSprites. Every place we referenced TestSprite in the original code, we now have to loop through the array to handle all of our TestSprites. To make this a little easier to manage, we'll refactor our original sprite.Update() call in Update into a new UpdateSprites method that updates every sprite. And we'll add a new HandleCollisions method to our game to test for collisions. Finally, we'll change Update so that it only calls StartTimer and StopTimer around the call to HandleCollisions. The relevant sections look like this:

private TestSprite[] sprites = new TestSprite[10]; protected override void Update(GameTime gameTime) { if (Keyboard.GetState().IsKeyDown(Keys.Escape)) { this.Exit(); } UpdateSprites(gameTime); StartTimer(); HandleCollisions(); StopTimer(); base.Update(gameTime); } private void UpdateSprites(GameTime gameTime) { foreach (Sprite sprite in sprites) { sprite.Update(gameTime); } } private void HandleCollisions() { for (int i = 0; i < sprites.Length; i++) { for (int j = i + 1; j < sprites.Length; j++) { sprites[i].TestCollision(sprites[j]); } } }

Looking at that, you may wonder why I'm not using foreach for the HandleCollisions call. It's simply because with foreach we have no way of knowing what sprites we already tested. This algorithm tests every sprite against every other sprite exactly once.

So, what are the results? On my machine, with 10 sprites, I get the following:

300 updates took 48827 ticks (13 milliseconds). Each update took 0.04333333 milliseconds. 300 updates took 42466 ticks (11 milliseconds). Each update took 0.03666667 milliseconds. 300 updates took 42371 ticks (11 milliseconds). Each update took 0.03666667 milliseconds. 300 updates took 43086 ticks (12 milliseconds). Each update took 0.04 milliseconds. 300 updates took 43449 ticks (12 milliseconds). Each update took 0.04 milliseconds.

Wow. Handling collisions for ten sprites takes only twice as long as it did just to move one sprite. How could that be? It's partly due to the overhead of using the Stopwatch class and making function calls, and partly due to the fact that we are measuring very fast operations. Obviously, the closer you get to the resolution of the underlying timer, the more error you get in trying to time things.

We'll talk more about timer resolution later. Before we go on, notice also that the impact of the JIT compiler during our first set of updates is signicantly less, comparatively, than it was in our last example. This shows how effective JIT compilation is and why we don't need to worry about it impacting the performance of our game. We may take a performance hit the first time a section of code is run, but it's relatively miniscule to our overall performance.

Now let's see what happens when we increase the number of sprites to 100:

300 updates took 2079460 ticks (580 milliseconds). Each update took 1.933333 milliseconds. 300 updates took 2156954 ticks (602 milliseconds). Each update took 2.006667 milliseconds. 300 updates took 2138909 ticks (597 milliseconds). Each update took 1.99 milliseconds. 300 updates took 2150696 ticks (600 milliseconds). Each update took 2 milliseconds. 300 updates took 2169919 ticks (606 milliseconds). Each update took 2.02 milliseconds.

Whether you should be impressed or dismayed depends on how you want to use this collision handling algorithm. On the one hand, averaging 2 milliseconds per frame is still a miniscule part of our 16.7 millisecond frame time. If you are not planning on having more than a hundred sprites or so, this algorithm will suit your needs perfectly. But looking at the relative time difference per sprite gives a completely different perspective. It takes us 50 times as long to handle 10 times the number of sprites.

Hmmmm... what happens when we increase to 500? I urge you to run this code so you can see the results for yourself!

300 updates took 28266113 ticks (7896 milliseconds). Each update took 26.32 milliseconds. 300 updates took 28179606 ticks (7872 milliseconds). Each update took 26.24 milliseconds. 300 updates took 28291296 ticks (7903 milliseconds). Each update took 26.34333 milliseconds. 300 updates took 28199114 ticks (7877 milliseconds). Each update took 26.25667 milliseconds. 300 updates took 28182787 ticks (7873 milliseconds). Each update took 26.24333 milliseconds.

Another "Wow!", and this time there is no way to hide the dismay. The movement is noticeably jerky and we're clearly getting far less than our desired 60 frames per second! In fact, just the HandleCollisions call alone is taking almost twice our allotted 16.7 milliseconds per frame. Multiplying the number of objects by 5 increased our time by 13!

Of course, as I mentioned in Part 1, this is what we would expect from an O(n2) algorithm. The times aren't increasing exactly quadratically, due to overhead, but the rate of increase is clear.


Does this mean we should never consider this algorithm? Hopefully at this point the answer is obvious. Many games can easily get away with only having on the order of a hundred or so objects active at a time, which we have clearly shown can be handled easily. The fact that the algorithm is trivial to implement and maintain makes it a no-brainer for a large number of games.

On the other hand, if you know you will need to have hundreds of objects, you'll need another solution. You have two options — optimize this algorithm or find a new one. Anyone who is experienced with code optimization will see several obvious ways to make both the algorithm and its implementation more efficient.

For starters, most games don't actually need to test every object against every other object. Taking my Space Invasion game as an example, I don't need to test Invaders for collision with other Invaders. In fact, it's almost crazy to do so. (But I do anyway!)

Another obvious optimization is that the Sprite class's BoundingBox property is adding the Sprite's current screen position to its internal BoundingRectangle every time TestCollision is called. This despite the fact that the Position changes only once or twice per frame. TestCollision, on the other hand, is called once for every other Sprite in the game. (Note that I did optimize this in my actual Space Invasion game by updating the BoundingBox in the Position property's setter. As we can see from these numbers, however, I would have gotten equally satisfactory results if I had used the less efficient approach shown here.)

In addition, the Sprite's TestCollision code is computing the actual intersection rectangle even though we're not using it here. We could easily save some time by not computing it. But we give ourselves more flexibility by going ahead and doing it. Remember that this is supposed to be a generic Sprite class that can be used for many games.

These suggestions don't even get into implementation optimizations, such as always passing our BoundingBoxes by reference instead of value; and providing direct access to member variables instead of accessing them through Properties. These are exactly the types of optimizations suggested by many efficiency proponents in the XNA forums. But these also make the code less readable, harder to debug and harder to maintain.

Since Space Invasion never has more than around 60 objects on screen at a time, the unoptimized brute force approach works just fine. And that's undoubtedly true for many other games as well. (In fact, as we'll see in Part 3, the entire Update loop for Space Invasion averages less than a millisecond per frame!) But what if your game does need more than 100 collideable objects? Shouldn't you make those optimizations so you can handle them?

The answer is... maybe. By making some of these optimizations, we can get this same brute force algorithm to handle 500 objects at a far more reasonable 6.4 milliseconds per frame. You can download the optimized project here

300 updates took 6682522 ticks (1866 milliseconds). Each update took 6.22 milliseconds. 300 updates took 7038462 ticks (1966 milliseconds). Each update took 6.553333 milliseconds. 300 updates took 7023610 ticks (1962 milliseconds). Each update took 6.54 milliseconds. 300 updates took 6718281 ticks (1876 milliseconds). Each update took 6.253334 milliseconds. 300 updates took 7136208 ticks (1993 milliseconds). Each update took 6.643333 milliseconds.

That's an impressive improvement and shows how signicantly performance can be optimized through these techniques. But the disadvantages mentioned above — less maintainable and less flexible code — shouldn't be ignored. And even if you do these sorts of implementation optimizations, keep in mind that this algorithm will still degrade exponentially as you add more objects. You may be able to move up from 100 to 500 objects, but it won't get you to 1000. At some point, you need to recognize that you need a different algorithm to efficiently handle more objects — such as one that partitions your game space, like quadtrees.

Finally, remember that 6.4 milliseconds is still 40% of your entire frame time. If you are maintaining on the order of a thousand or more objects at a time, other parts of your code are almost certainly also going to be difficult to manage at a reasonable frame rate. Is optimizing your collision detection the best use of your time? How do you know in advance which ones to optimize? Optimizing all of them as you go will surely take you longer to write, not to mention make your code more difficult to debug and maintain.

If benchmarking shows your algorithm has problems without implementation optimizations, you are probably better off with a different algorithm. Which isn't to say that these sorts of optimizations aren't still useful. But it's better to do these selectively, once you have your code implemented and have profiled it to find specific problem areas. We'll look at how to use a profiler to find these problem areas next week, in Part 3.

Friday, March 23, 2007

Software Efficiency and Optimization (Part 1)

Software efficiency and optimization is a frequently recurring theme in the XNA Forums. It invariably involves questions about the C# foreach construct, garbage collection, and whether value types should be passed by reference.

While those are all valid concerns, and anyone developing XNA games needs to be aware of these issues, it's clear from many of the posts that some XNA developers were blowing them out of proportion. Many posts have claimed that anyone developing XNA games should never use foreach, as well as suggestions to always pass Matrix structures by reference. This led XNA Framework developer Shawn Hargreaves to write a wonderful post about why it is more important to write clean, maintainable code than to worry overly much about performance, especially early in the development lifecycle.

As a professional programmer who has written software for NASA, commercial communication satellites, and the defense industry, I agree with Shawn almost completely. Two favorite sayings that were drilled into my head early in my career are:

Premature optimization is the root of all evil.


The first rule of program optimization: don't do it. The second rule: don't do it yet.

Yet the fact is, game software does have to be efficient. Games are a class of realtime software, meaning that it is not enough for them to produce the correct result, they must also complete within a fixed time window. In general, game developers shoot for a minimum of displaying 30 frames per second in order to produce smooth, glitch-free animations; and most prefer 60 frames per second. That means that all of the game calculations — getting player input, implementing enemy AI, moving objects, collision detection and handling, and drawing each frame — must be completed within 16.7 milliseconds! When you consider that most modern videogames have hundreds, or even thousands, of objects that have to be updated and drawn within that time period, it's no wonder that programmers feel they have to optimize every line of code.

However, from reading the many posts with questions about "the fastest way" to implement something, it is obvious that many XNA programmers aren't familiar with the tools and methods for determining when, where, how, or even if, they should optimize their code. The point of this article is to help you answer those very questions.

Design Versus Implementation

A common response by those who question, or even outright disagree, with the idea that optimizing code early is a bad idea, is to point out that it is far easier to change software early in its lifecycle than after it has been written. That is, of course, very true. But that is why it is important to understand the difference between design optimization and implementation optimization.

While designing a game (or any software), you must take into account the size and complexity of your game, and select the correct data structures and algorithms that can support it. A simple 2D shooter or platformer with no more than a hundred or so objects interacting at any given time can probably get away with a brute force approach for handling movement and collisions. Maintaining a simple list or array of objects and iterating through it each frame will most likely work fine, and will be very simple to implement and debug.

However, a more complex game world, with perhaps thousands of active objects, will need an efficient method of partitioning the game space to minimize the number of object interaction tests each frame. Similarly, games requiring detailed enemy AI will need to rely on algorithms that can produce "intelligent" actions as quickly as possible.

There are a large number of available resources that discuss game programming algorithms, including the use of quadtrees and octrees for partitioning the game world to minimize collision detection tests; the minimax algorithm with alpha-beta pruning for efficiently finding the "best" move in two player strategy games; and the A* algorithm for efficient pathfinding.

Discussion of the specifics of these algorithms is outside the scope of this article. The important thing to take from this is:

The selection of the appropriate data structures and algorithms during the design phase has a far greater impact on the eventual performance of your game than any implementation optimization you will make.

Why? Because your algorithms determine the maximum number of operations your game will have have to perform during each frame.

To demonstrate this point, imagine that for your first game you write a simple 2D shooter that relies on a brute force approach to collision detection. Every frame, you simply test every active object against every other active object to see if they intersect. Because you decide to have only a limited number of enemies active at a time, it works well and easily runs at 60 frames per second.

With that experience under your belt, you now want to write a second, far more ambitious game. This time you decide to write a Zelda-like adventure game with a large scrolling game board and hundreds of objects moving around it simultaneously. Using your existing code as a starting point, you get well into the game's implementation before you discover that the brute force approach that worked very well in your simple game doesn't work so well in this new game. In fact, you may be measuring screen draws in seconds per frame instead of frames per second!

The reason is that comparing every object against every other object is what is known as an O(n2) algorithm. That is, the number of operations that have to be performed is related to the square of the number of objects you are operating on. If you have ten objects in your game, you only have to perform a hundred tests to see if there are any collisions. If you have a hundred objects, you have to perform ten thousand tests. Which may still be possible on a modern PC if each test can be done quickly enough. But if you have five hundred — just five times as many as the last example — you will have to perform 250,000 collision tests. Even if each test took only 67 microseconds, you would stll be using the entire 16.7 milliseconds frame time (at 60 frames per second) just for collision detection. The point is, it doesn't matter how efficiently you implement that algorithm in code, its performance will still devolve exponentially with the number of objects in your game and will therefore be the single greatest limiting factor to the size of your game.

Big O Notation

Big O notation is a mathematical method of comparing algorithmic efficiency. It is sometimes derided by programmers because it says nothing about the actual time a particular implementation of an algorithm will take. However, it can be quite useful if applied appropriately to help compare algorithms at design time.

The following table shows the relative efficiency of the most common types of algorithms, in decreasing order of efficiency. Thus, an O(log n) algorithm is considered more efficient than an O(n) algorithm, which is more efficient than an O(n2) algorithm.

Mathematical Property
The number of operations are always the same no matter how many objects are being operated on.
O(log n)
The number of operations increases proportional to the logarithm of the number of objects being operated on.
The number of operations increases proportional to the number of objects being operated on.
The number of operations increases proportional to the square of the number of objects being operated on.
The number of operations increases proportional to the factorial of the number of objects being operated on.

Remember that the point here is to determine how well the algorithms scale to handle more objects. An O(n2) algorithm where each actual operation is performed very quickly, may be faster than an O(log n) algorithm for many values of n. However, the O(n2) algorithm will always have a lower limit to the maximum number of objects it can handle. The trick is to choose an algorithm that is easy to implement while being efficient enough for the number of objects you want to handle in your game.

For discussions of videogame related algorithms, I highly recommend the Game Programming Gems series. Various websites, including Gamasutra and also have many good articles on algorithms.

It's also important to understand the relative performance of the C# generic collection classes. MSDN has a good article on how to select the appropriate Collection class for your needs, as well as an article on when to use Generic Collections. (My answer to that question is to always prefer Generic Collections to their non-generic counterparts. They are just as easy to use, are almost always more efficient, and have less chance of impacting the garbage collector.)

What Next?

That's an awful lot of background and theory, and I haven't even started to talk about optimizing implementation performance. Of course, that's the point I am trying to make — understanding design tradeoffs in order to select the design that is most appropriate for your game is the single most important task you will face in writing an efficient game.

However, it doesn't end there, and you may well end up needing to optimize certain sections of your code as you go along. Part 2 of this series will look at the importance of prototyping various aspects of your game, as well as showing you tools for profiling and benchmarking your code in order to determine which, if any, sections need to be improved.