Tuesday, July 10, 2007

Pathfinding Sample Using A*

Most game developers these days know of the A* (pronounced A-star) algorithm as the preferred method of finding a path around obstacles on a map. A* is actually a general-purpose graph-search algorithm based on theories presented by Hart, Nilsson, and Raphael in a 1968 paper for the IEEE Transactions on Systems Science and Cybernetics entitled, "A formal basis for the heuristic determination of minimum cost paths."

Whew! Fortunately, although it may seem intimidating at first, the A* algorithm is quite simple and you don't have to be an IEEE fellow to understand it. My first introduction to it was in one of my AI classes in college in Nilsson's book, "Principles of Artificial Intelligence." Interestingly enough, that book doesn't even mention the algorithms use for pathfinding on a map. The example he provides uses the algorithm to traverse a search tree to solve the 8 puzzle.

However, it should be obvious (when you think about it) that pathfinding on a map is merely a specialized case of a generic graph-search. And, as it turns out, one that the A* algorithm is particularly useful for solving, provided you can determine accurate "costs" for moving between adjacent nodes on the map.

Overview

The basic idea of the A* algorithm is to divide a map into adjacent nodes. Some nodes are free of obstacles and can be entered. Other nodes can not be entered. Between any adjacent nodes, there is a cost associated with moving from one node to another.

The algorithm starts by looking at all of the nodes adjacent to the starting point. For each one, it determines an estimated total cost of moving from that node to the destination, generally based on distance without regard to possible obstacles (which aren't known yet).

It then chooses the node with the least estimated total cost and looks at all of the nodes adjacent to it. This time it tracks the actual cost of moving from the original node to the current one and then, again, adds an estimated cost to the destination to come up with a revised estimated total cost.

From all of these nodes, it again selects the node with the least estimated total cost and repeats the process. As it does so, it is continually refining the actual cost of moving into each node along various paths, always doing so by searching the lowest estimated costs first. When paths are discovered with actual costs that are worse than actual costs to the same node by other paths, they are eliminated from further consideration. The end result can be proven to be the shortest path between the start and destination nodes (assuming the heuristic for estimating total cost meets certain conditions).

For a more complete, and very readable, description, I highly recommend the article, A* Pathfinding for Beginners, by Patrick Lester. I couldn't hope to explain it in more understandable terms than he does.

An A* Implementation

I have created a sample "game" that demonstrates the implementation of A* that I used in Snowball Fight!. I created this as a separate sample application in order to make it easier to understand how the algorithm works and how to use this specific implementation. I stripped out almost all of the unnecessary code to focus as much as possible on the Pathfinding class.

In the sample, a Bug (er, insect) traverses the shortest available path around obstacles on a map towards a Flower. The Flower can be moved by the player using the keyboard or an Xbox 360 controller. Whenever the Flower is moved, the Bug uses the Pathfinder class to find the shortest path to its new location.

The sample is divided into two namespaces. The SwampLib namespace contains general purpose classes that can be reused in many games. The PathfindingSample namespace contains classes that are specific to the sample "game".

SwampLib

The main classes of interest in the SwampLib namespace are:

IMap
Defines an interface between a map and the Pathfinder class. This allows many map implementations to be used with the Pathfinder. The only restriction is that the map must be capable of being divided into "tiles". Note that this does not mean the map must be a traditional tilemap. In fact, the Map implementation in this sample is not a traditional tilemap.

MapLocation
A struct that represents a tile on an IMap.

Pathfinder
Any object that wishes to find a path through a map must create an object of this class. The constructor to the Pathfinder object takes an IMap object as a parameter. The map associated with the Pathfinder can later be changed, if desired, with the one restriction that the map must be of the same size as the original map.

The FindPath method is used to find a path from any starting tile to any destination tile. The path found is stored as a stack of destination MapLocation waypoints.

Traversing the path is done by calling Pathfinder's Complete method to make sure that the destination has not already been reached. If Complete returns false, the GetNextWaypoint method can be called to get the MapLocation of the next waypoint.

PathNode
PathNode is only of interest in understanding the A* implementation. It is not publicly accessible outside of the SwampLib namespace and is used internally by the Pathfinder class to keep track of the status of each node (tile) on the map while finding a path.

Sprite
The Sprite class is not related to pathfinding, or the A* algorithm, but is provided as a reusable class for managing on-screen sprites. It provides collision detection and handling capabilities via the BoundingCircle and BoundingRectangle classes, although these are not used in this sample.

XnaGame
The XnaGame class is a subclass of Microsoft.Xna.Framework.Game class that is intended as a parent to other Game classes. It provides access to a number of commonly needed resources and methods.

PathfindingSample

The classes that make up the PathfindingSample itself are:

PathfindingGame
The game itself, which inherits from SwampLib.XnaGame. This class creates the GameComponents and Sprites, and loads all game content. It also manages updating and drawing the player's target Reticle.
InputComponent
This GameComponent manages the state of the keyboard and gamepad input controls. It is implemented as a singleton to make it easier for all of the other classes to access its properties.

The SetTarget property is true if the user has requested to change the target's position via the A button on the gamepad or the Spacebar on the keyboard.

The ToggleBlockedTiles property is true if the user has requested to toggle whether or not blocked tiles are highlighted on the map.
MapComponent
This DrawableGameComponent handles the majority of the game logic. It maintains a list of Sprites associated with the map (although in this case, that is only the Bug and Flower). It is responsible for updating all active Sprites and drawing the map and all visible Sprites.
Bug
The Sprite that uses the Pathfinder class to find the shortest route to the Flower's current position. Look at this class to understand how to use the Pathfinder class.
Flower
The Sprite that represents the target the Bug is seeking. Its location is set by the Reticle class when the user changes the target's position via the SetTarget property of the InputComponent class.
Reticle
The Sprite that represents the player's targetting reticle. It is under the control of the user via the left thumbstick on the gamepad or the arrow keys on the keyboard. When the user selects the SetTarget input, it raises the TargetMovedEvent which notifies the Flower and Bug of the Flower's new location.
Map
The game's implementation of the IMap interface. It is responsible for actually drawing the map background and fixed objects when directed by the MapComponent. It is also responsible for keeping track of which tiles are clear and which are blocked by fixed objects. Finally, it provides a number of helper methods that are used by the other classes that need to query the status of map tiles.

Conclusion

That's it! You can download the full source here:

Pathfinding Sample

I tried to make the code as readable and well-documented as possible. If you have any questions, feel free to leave a comment or email me at SwampThingTom at bayougames dot com.

Monday, July 2, 2007

Dream-Build-Play Entries

The following is a list of Dream-Build-Play entries, with links to web sites, videos, and installers (where available). I'll edit this post as I learn about more.

Aquattack
ZX360
2D Action
Windows / Xbox

Baby Gamer - Musical Rain
Baltico X
Screenshot

Bathtub Navy
Galactysis
3D Action
Website
Video
Screenshot
Windows
Xbox

Battle for the Seas
Joopsy
3D Turn-based Strategy
Video
Windows

Big Sky
Screenshot 1
Screenshot 2
Screenshot 3

Blazing Birds
Dagfari
2D Action
Website
Video
Xbox

Block Realm
Chromatic
3D MMORPG
Website

Botload
jstewart
Xbox

Bullet Hell Tactics
Kobingo
Website
Video
Windows

Burning Angels
Arthur_Frontfree
3D Shooter (3P)
Website
Video 1
Video 2

Butterfly Paradise
CatalinZima
3D puzzle
Screenshot 1
Screenshot 2
Screenshot 3
Xbox

Chaos
hellborg99
2D Action
Website
Windows

Cuber
Taber
3D Puzzle
Website
Xbox
Windows

Crystal Glyder
vRAGmAZTER
3D Flyer
Website
Xbox

Dark'nd
FritzLafferty
3D Sci-Fi/Action Multiplayer Arcade
Website
Xbox

The Dishwasher: Dead Samarai
Jamezila
2D Brawler
Website
Video

Dog Food
CheekyOaf
2D Platformer
Website
Windows

Doppelganger
PumpkinPaul
2D Shooter
Website
Windows

Dream Sower
Kyle0654, LoneRockstar, CatalinZima, atlStylin
3D Action
Website
Screenshot 1
Screenshot 2
Screenshot 3
Video
Xbox
Windows

EleMental
MadGamer7
FPS
Website
Windows

Evil Aliens
CoamIthra
Windows

Exisled
LOSTlogic
2.5D Shooter
Xbox
Windows
Video 1
Video 2
Video 3

The Farthest Land
xVxObliVioNxVx
2D Fighter
Video
Windows

Galactic War
DarthCheesiest
3D Action
Video

Ghost Ball
Da Coder
3D Sports
Screenshot 1
Screehshot 2

Give it a Rest
Camp ELM
2D Side Scroller
Windows

Gravitron Ultra
Screenshot 1
Screenshot 2
Screenshot 3

GunStyle
Alex
2D Multiplayer Shooter
Website

Hasta La Muerte
Screenshot 1
Screenshot 2
Screenshot 3

Head Banger
Ganksoft Entertainment
Website
Windows
Xbox

Hippocrates Dilemma
Handkor
2D Shooter
Website
Windows

HurricaneX
3D Action
kobunketsu
Video

In Your Face Football
barkers crest
2.5D Arcade Sports
Video
Xbox

Lazyville
hnathanh
Video

Little Gamers: Teh Game
Epsicode
2D Action
Website
Video
Xbox

Magic Crystals
qillerneu
2.5D Action Puzzler
Xbox
Windows

Mech Rider Z
DexterZ
3D Mech Shooter
Website

Memories
MarginalWotan
3D Action Adventure
Windows

Minpick
BitChain
2D Puzzle
Website

Mr. Pluck Adventures
rwp QUARK
3D Platformer
Video
Windows

Nanonomi
Duckocide
2.5D Action
Website
Video
Xbox

The Night of the Puppets
MutantPenguins
2.5D Action Adventure?
Website

Odyssey
Ringo
2D Shooter
Windows

Proximity HD
SageClock
2D Strategy / Board Game
Windows

PvP
Chryso
FPS?
Screenshot 1
Screenshot 2
Screenshot 3

Ragu
Screenshot 1
Screenshot 2
Screenshot 3

The Real Invaders
lalolete
3D Action
Video

Rocketball
gameD3V
2.5D Party
Website
Screenshot 1
Screenshot 2
Screenshot 3

Roto Motion
HadesSpaniel
2D Puzzle
Video
Windows

Samarai Soul Hunters
SamaraiForever
2.5D Brawler
Website
Video

Shuggy
deejay169
2D Platformer
Website
Video
Xbox
Windows

Sky Burner
moriarty72
3D Action Flying
Windows

Snapcount
AwkwardGamesLtd
2D Action
Windows
Screenshot 1
Screenshot 2

The Snoobies
chenpwnage
2D Action
Website
Xbox
Video
Screenshot 1
Screenshot 2
Screenshot 3

Snowball Fight!
SwampThingTom
2D Party
Website
Xbox
Windows

Solar Defense
chronos78
2D Shooter
Screenshot 1
Screenshot 2
Screenshot 3

Space Sudoku
Rexorep
Screenshot
Video

Sprockets of Strife
Website
Screenshot 1
Screenshot 2
Screenshot 3

Starcrushers
Zener
Website
Video
Windows

Tower Defense
KriscC
2D Action Arcade
Website
Windows

Truck Defender X
Christopher
3D Action / Strategy
Video

Viduce
Screenshot 1
Screenshot 2
Screenshot 3

Vocab Trainer
Jendrik
Educational
Website
Windows

Vacuum Ball
Greytone8
2.5D Action Puzzler
Website
Xbox

X-Road
DrCosinus
2.5D Shooter
Website
Windows
Xbox

X Space Cruiser
Anonymous
3D Space Racing
Website
Screenshot 1
Screenshot 2
Screenshot 3
Windows

xSpace
ClydeCoulter
3D Space Shooter
Website
Windows

Xtreme Table Soccer
Nivel21
Windows

Yo Ho Kablammo!
Screenshot 1
Screenshot 2
Screenshot 3

Dreamt-Built-Played

Uploaded the latest build of Snowball Fight! to the Dream-Build-Play website this morning. There's plenty more I plan to do with the game but it's nice to have this milestone passed. Thanks to everyone who provided feedback on the previous release. And, of course, thanks to Microsoft's XNA Team for providing an excellent product with amazing support (for free!) and to the XNA Community for providing so much help and information.

You can download the latest Windows release at bayougames.com and the latest Xbox 360 release at gameprojects.com.

Wednesday, June 20, 2007

Snowball Fight!

I'm happy enough with the progress on my Dream-Build-Play entry that I've decided to put a beta out for public release. Snowball Fight is a 1-4 player party game where you try to knock your opponents out by hitting them with snowballs before they knock you out. Of course, you can only hold so many snowballs at a time, so don't wander too far from your snow pile when you are getting low. Also, keep an eye out for random power ups for temporary bonuses.

I've also created a new website / blog for distributing my games — Bayou Games. "Advice from the Swamp" will continue to focus on videogame programming, while bayougames.com will focus on providing playable games to gamers.

Snowball Fight

I'm afraid that, for now, I've decided not to release the source. (But I also didn't obfuscate it, so feel free to Reflect at will.) I'm trying to determine the best way to share the source with the XNA community while keeping my options open for publishing the game in some form. My preference is to turn the source into an on-going tutorial on writing a complete game — in the same vein as Reimer's tutorials, but with a truly complete game as the final product. But that clearly takes a serious commitment and, as I said, I'm keeping my options open for now.

However, since there is clearly interest in an article on the A* pathfinding algorithm, I will be providing that soon. Again, I'm not certain exactly what form it will take, but it will be available before the end of the month.

Tuesday, June 5, 2007

What Next... Again

Thanks for the comments. It looks like there is some interest in an article on implementing the A* pathfinding algorithm. I have an implementation in my Dream-Build-Play game, and I'm trying to figure out the best way to turn that into one or more articles.

The biggest problem is that spending time writing articles and putting together these blog entries takes significant time that otherwise could be used improving my DBP game. With the DBP deadline a mere month away, I'm trying to decide how best to balance that.

I also have plans for an article to demonstrate how to write game control input code that is completely independent of the actual input device. I have a series of classes (also written for my DBP game) that lets the user select pretty much any input device and customize the controls for that device in such a way that the game itself knows nothing about the actual input device being used. It consists of a series of adapter classes that can convert any digital input to an analog input and vice versa, so that the user can configure specific game controls to use arrow keys on a keyboard, the Dpad of a game controller, or an analog joystick without any changes or special treatment by the game code.

Sunday, June 3, 2007

A Generic Pool Collection Class

I originally intended to write this article for Ziggy's XNA Article Contest. Unfortunately, real life got in the way in the form of a few weeks of crunch-time at work followed by a much-needed (and fabulous) vacation in Paris. Between those things and spending the little free time I had last month on my Dream-Build-Play entry (and not as much time on that as I'd have liked), I didn't get the article finished in time for his deadline. He received some great entries, though, so if you haven't already, head over there to read them. My favorites are the two HLSL articles by Jamezilla, one on Bloom/Blur Post-Processing and the other on Shockwave Distortion; and the article on Bezier Curves by Cubed2D. But all of the entries are worth a read.

EDIT: It looks like just as I was posting this entry, Ziggy was announcing the contest winner. Congrats to KrisC for his article on Graphic User Interfaces in XNA!

Resource Pools

Games need to be able to maintain collections of various types of resources, including such things as projectiles, particles, enemies, etc. Most non-game .NET applications would simply allocate such resources dynamically and then let the garbage collector dispose of them when they are no longer needed. Unfortunately, dynamic memory allocation and deallocation can, and will, cause slowdowns at indeterminable times. For games, or any real-time software, this is obviously unacceptable. As a game programmer, you need to be in control of when resource allocation and deallocation occurs and how long it takes.

The solution to this is to create resource pools where all of the memory for each resource type is allocated up-front during the game's initialization. When a resource is needed, an available item can be removed from that resource type's pool. When the resource is no longer needed, it can be returned to the pool.

Because a resource pool is a group of objects of the same type that has a fixed size, they are almost always implemented as an array. However a resource pool also has to keep track of which objects are currently being used, and there are multiple ways to do that. One way is to simply add a field in the resource class that specifies whether that particular object is active. The problem with that approach is that you have to rewrite code to manage each particular resource's pool. That is, you'll need separate code to manage your projectile pool and your particle pool, even though the code is doing basically the same thing. You could get around this "block copy" approach by having a parent class that maintains the active flag and contains the methods used to get a free object from the pool, update the active objects in the pool, reset the pool, etc. But that requires that all of your resource classes inherit from that parent, which isn't always desirable (and sometimes may not even be possible).

The most flexible approach is to have a generic Pool class, similar to the other .NET generic collections, that handles the maintenance of resource pools and can hold any type of object. In this article, we'll look at an efficient implementation of a Pool class and how to use it for a variety of different resource types.

Along the way, I'll describe some of the design decisions and trade-offs that go into writing generic classes. However, this is not intended to be a tutorial on writing C# generics. If that's what you are looking for, I recommend first reading a good book or online tutorial that covers the topic.

Desired Features of a Generic Pool

The following features are required for our Pool class to be useful.

Get — Takes a free item from the pool and marks it as active.

Return — Returns an active item to the pool and marks it as available.

Clear — Returns all active items to the pool and makes them all available.

Capacity — Gets the total number of items in the pool, both available and active.

AvailableCount — Gets the total number of available items in the pool. This is necessary to ensure the user doesn't attempt to Get more than the available number of objects. Therefore any call to Get should be preceeded by a test to ensure there are objects available.

ActiveCount — Gets the total number of active items in the pool.

CopyTo — Copies all of the active objects in the pool to a one-dimensional array. This is needed so that a resource pool can hold vertex data which can be easily copied to an array for passing to the GPU.

GetEnumerator — Gets an enumerator that can iterate over the pool's active items.

AllNodes — Gets an enumerator that can iterate over all of the items in the pool, active and inactive. This can be useful if there is one-time initialization that needs to be performed on all of the resource objects in a pool before they are ever used. Generally, however, after initialization, users should only be accessing active objects.

efficiency — Because resources such as particles and projectiles are generally being allocated and returned many times per frame, the pool class must be as efficient as possible. Where possible, all methods should be constant time — O(1) — and none should be worse than linear — O(n).

encapsulation — The implementation details should be hidden from the user as much as possible. In particular, the user should not have direct access to the active flag which should be modified only by the Pool class itself.

ease-of-use — The class should put as few restrictions on the user as possible so that it can be used as easily as other collection classes. In particular, it should have very little restriction on the types of objects that it can hold.

Implementation

To meet the efficiency requirements, the Pool class is implemented as an array along with a Queue containing the available (non-active) objects. By maintaining the list of available objects as a queue, the Get and Return methods can be O(1).

To meet the encapsulation requirements, the pool array is not an array of the objects being held by the pool, but an array of a Node struct. Each node contains the item being stored in the Pool along with the index of that item in the array. Because the node has to be given to the user so that it can be returned to the pool when the user is finished with it, the active flag is not stored in the node itself. If it were, the user would have direct access to it and could change it themselves, which we want to avoid. Instead, there is a separate array of booleans whose elements correspond to the elements in the pool array. Thus the NodeIndex field in the Node struct provides the index into both the pool array and the active array.

Finally, the ease-of-use requirements are mostly met by virtue of the class being implemented as a generic. As mentioned earlier, this removes the limitation of having a parent class for all resources that are to be stored in pools. However, since we are storing the actual resource objects inside Node structs, we need to be able to allocate each one when we create the nodes in the array. This means we have to limit resources to types that have a public, parameterless constructor. In other words, you can't store basic data types such as ints, floats, etc., inside a Pool. Since this Pool class is intended for holding resources that are more complicated than the basic data types, that restriction shouldn't be a problem.

The use of nodes when getting and returning pool elements is another design trade-off that slightly limits ease-of-use, in that the user has to deal with another data structure besides the one they want to store in the pool. To remove the slight burden this presents, the iterator returned by the GetEnumerator method provides direct access to the resource objects rather than nodes. Another enumerator is provided by the ActiveNodes property which can be used to iterate through nodes. This gives the user the flexibility to access active resource objects directly, or to access nodes for times when they may need to return an item to the pool. In general, I'd recommend using the ActiveNodes property inside an Update method, where the update may cause the resource to be finished and need to be returned to the pool; and the GetEnumerator property inside a Draw method, where you just want to go through all of the active resources and draw them.

The other disadvantage of storing the resource objects inside Node structs in this way, is that for value types (structs) stored in a pool, the user will only ever get a copy of the object instead of having direct access to it. That means that when using a Pool of a struct type, setting the node's Item field doesn't change the value of the item in the pool. To get around this problem, the SetItemValue method is provided to update the value of a given node in the pool.

Using the Pool Collection

Hopefully the comments provided with the class are fairly self-explanatory. The following subsections provide an overview of how to use the class.

Initialization

To use a Pool collection, all you have to do is construct it with the number of elements to be stored, generally at program initialization. This number of elements in the pool should be more than you will likely need, but not many more. It may require some tweaking to get that number just right for each specific use but generally you should have a pretty good idea of how many elements you need based on the design of your game and how many of each projectile, particle, etc. are likely to be active.

class Missile { private Vector2 position; private Vector2 velocity; private int configValue; public Missile() { } public void Init(int configValue) { this.configValue = configValue; } public void Fire(Vector2 position, Vector2 heading) { this.position = position; this.heading = heading; } public bool Update() { // Move missile, etc. ... // See if missile is finished if (reachedMaxDistance) { return false; } return true; } public bool Collided(Rectangle otherObject) { if (otherObject.Intersect(position)) { return true; } return false; } public void Draw() { // Draw missile ... } } public Pool missilePool; public void Init() { missilePool = new Pool<Missile>(MaxMissiles); // If there is any one-time initialization of all // resources, it can be done here. Generally, // though, this shouldn't be needed. foreach (Pool<Missile>.Node missile in missilePool.AllNodes) { missile.Init(ConfigValue); } }

Update

During the update portion of each frame, you will get items from the pool if needed, update all active items, and return items that are no longer needed.

public void Update() { // See if the user has fired a missile // (and verify there are some left in the pool) if (MissileFired && missilePool.AvailableCount > 0) { // Get a missile from the pool. Missile missile = missilePool.Get().Item; // Initialize the missile's position and velocity. missile.Fire(player.Position, player.Heading); } foreach (Pool<Missile>.Node node in missilePool.ActiveNodes) { if (!node.Item.Update()) { // If something happened during its Update to // make the missile no longer active, return // it to the pool. missilePool.Return(node); } else { // Note that if Missile was a struct instead // of a class, you would have to do the // following here: // missilePool.SetItemValue(node); // Loop through objects it might hit. foreach (GameObject item in ActiveObjects) { if (node.Item.Collided( item.BoundingRectangle)) { missilePool.Return(node); } } } } }

Draw

During the draw portion of each frame, you will get items from the pool if needed, update all active items, and return items that are no longer needed.

public void Draw() { // Draw each active missile foreach (Missile missile in missilePool) { missile.Draw(); } }

XNA Creator's Club Particle Sample

To provide a more complete example of using the Pool collection, I've modified Microsoft's XNA Creator's Club Particle Sample. The only changes from the original are the inclusion of the Pool class itself, in Pool.cs; and modifications to the ParticleSystem class, in ParticleSystem.cs. All changes have been marked with conditional compilation instructions (#if USE_POOL_CLASS) to make it easy to compare the original code with the new code.

Pool Collection Class

You can download the source file here.

Wednesday, April 18, 2007

Software Efficiency And Optimization (Part 3)

In the first two parts of this series, I discussed the difference between optimizing a design versus optimizing implementation, and showed how to use prototyping to benchmark key algorithms. This part will look at how to use a profiler to find performance bottlenecks.

The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.

So when do you do it? Only when you know you have a problem, and only when you know what the problem is. Profilers help you find these problems.

Argghhh! Why is my game so slow?

So you've finally gotten your game to the point that it is basically playable with most of the features you want implemented, and now you are hoping to get some friends to help you start beta testing it. But when you run it, the animation is incredibly jerky. Because you had been bragging to your friends about the game and want them to see it as soon as possible, you start going through the code in your mind, frantically thinking about what could be causing the slowdowns.

"I've heard foreach is very slow," you think, so you quickly do a search through your files and replace every one of your foreach statements with equivalent for statements, kicking yourself for not having done this in the first place. After fifteen minutes or so, and several recompiles because you made some mistake or another while translating the code — "Whoops, I declared my index variable to be 'index', but I called it 'i' when I tried to increment it." ... a couple minutes later ... "Dammit, that array only has ten elements but I told it to loop through eleven." — you finally get it running again... only to be dismayed to discover that it made no difference.

"Hmmm... I have an awful lot of matrices I'm passing around by value. Copying all of that data at every function call must take an awful lot of time." So you go through all of the code again looking for every method that takes a matrix and changing them to pass by reference. After many, many more recompiles than were needed during the change from for to foreach"Oh, come on, you mean I have to add the ref keyword both when I declare the method and when I call it?!?" — you finally get it back to the point where you can run it again. Unfortunately, now it doesn't work at all. You run it and objects start jumping around the screen for no apparent reason. Then you remember that some of your methods took advantage of the fact that a matrix was being passed by value and changed its value locally. Now that it is being passed by reference, that change is being made in the original object, which is not at all what you intended.

If you are lucky (or smart), you stop there and take a step back. If you're not lucky (or you're just foolish), you think that you'll be able to fix it and you start going into each method trying to figure out how to kludge your way around the problem on a case-by-case basis. Maybe you'll even succeed and get the game back to its runnable state that you had it in hours earlier. Maybe you're even lucky enough to have not introduced yet more bugs in the process. But in all likelihood, you haven't fixed the problem. And now you have code that does not run any better than when you started, but is harder to understand, harder to debug, and has kludges in it to get around problems that you introduced trying to fix the wrong problem.

Your time will be much better spent finding out where your problems are before you try to fix them.

Using NProf

Execution profilers are tools that help you find where your code is spending most of its time. There are two types of execution profilers. Event-based profilers use actual events, such as function calls, and trace the exact execution path as the program runs. Statistical profilers work by interrupting the program at frequent intervals to determine what section of code is currently running. Statistical profilers are less intrusive and therefore provide more accurate measure of how much time is spent in each function.

There are some very nice professional statistical profilers available for .NET development, including the ANTS Profiler and dotTrace. Both provide lots of great features, but both cost hundreds of dollars. Since one of the great aspects of XNA development is that Microsoft has made the Game Studio Express tools available for free, I'll show you how to keep using free tools by concentrating on the Open Source NProf profiler.

As with many (most?) Open Source tools, the main things missing from NProf are documentation and support. Once you get past that, though, it's a great tool. It doesn't provide the cool graphs and other bells-and-whistles of its $300+ commercial counterparts, but it lets us find out how much time is being spent in each of our functions. Ultimately, that's all we care about.

You can download the latest version of NProf (0.10 as of this writing) here. When you run it, you get the following (rather unhelpful) screen:

You might think that you would want to start by selecting File->New, which is the convention for Windows programs. Sadly, that does nothing. (Actually, it clears the current application so you can load a new one. But since you haven't loaded one at all yet, doing this now doesn't change the screen or give you any feedback.)

To select an application for profiling, press the Browse button. Then find a .NET executable for debugging. Note that when profiling XNA games, you will have to select a Windows executable — currently there is no mechanism for profiling Xbox 360 games.

Your XNA game is unlikely to take command line arguments, but on the off-chance that you have written it to support them, you can enter them in the Arguments text box.

Once you have selected a program to profile and entered any command line arguments, select Project->Start or press F5 to start the program. Your program will start running. Play through your game and then exit the program. When you exit, the NProf window will look something like this:

This is a sample run from my Space Invasion game. The two views shown in the top and bottom panels list every method that was executed when you played the game. The top view lets you expand each method to see what methods it called. The bottom view lets you expand each method to see what methods called it. In both views, the Time column shows what percentage of the program's total execution time was spent in each method. The methods are sorted by the percentage of the program's execution time that was spent inside that method.

Note that that includes the cumulative time spent inside every method that method called. So any time you run NProf on an XNA game, the first method listed will be System.Threading.ExecutionContext.Run and it will be shown as taking 100% of the time. That's because everything else in the game is ultimately called by that method, so every time NProf took an execution sample, it was in System.Threading.ExecutionContext.Run.

Looking down a bit further, you'll notice your main program (in this case, SpaceInvasion.Program.Main) and below that the XNA Framework's Run method (Microsoft.Xna.Framework.Game.Run). Continuing down even further, you'll finally find Microsoft.Xna.Framework.Game.Tick. In a fixed-step game, this will be called every TargetElapsedTime period (defaulting to 60 times per second).

If you expand that method in the Callees panel, you'll see that it calls three methods, Microsoft.Xna.Framework.Game.DrawFrame, SpaceInvasion.InvadersGame.Update, and Microsoft.Xna.Framework.GameClock.Step. The latter took only 0.01% of the total execution time, so we can safely ignore it. The SpaceInvasion.InvadersGame.Update method took only about 2% of the total execution time. And Microsoft.Xna.Framework.Game.DrawFrame took over 90% of the execution time!

Does that mean the program took over 15 milliseconds to draw each frame for a simple Space Invaders game with no more than 60 2D objects on screen at any time? Not at all. The time spent in the DrawFrame method includes the time spent waiting to synchronize with the vertical trace on the GraphicsDeviceManager.

For now, don't worry too much about how much time is spent resolving the back buffer to the GPU. (I'll talk very briefly about GPU issues at the end.) Instead, look at how much time is spent inside your program's Update and Draw methods. In this particular run of Space Invasion, these are approximately 3.4% for SpaceInvasion.InvadersGame.Update and 1.6% for SpaceInvasion.InvadersGame.Draw.

Since we know that Space Invasion is running at the default fixed-step time period of 60 frames per second, then the amount of time spent in Microsoft.Xna.Framework.Game.Tick each frame is 16.7 milliseconds. So we might be tempted to try to calculate the amount of time spent in each method by multiplying the percentage of time spent in each by 16.7.

However, this isn't accurate because, as we noted above, only 93.3% of the program's execution time was spent in the Microsoft.Xna.Framework.Game.Tick method. And, of course, I've already mentioned that even statistical profilers like NProf add some amount of overhead and error. While you could get out your calculator and start dividing the time spent in the Update and Draw methods by 0.933, there is yet another catch. The time shown is the total time for the entire program, including the time spent on the title screen when we weren't actually playing the game.

Of course, there are still ways around this if you are determined to get exact numbers for the average time spent in each method for each frame. But it's really not worth it. Don't make the mistake of thinking you can use a profiler to find out exactly how much time each method takes, or even that it's helpful to do so. Relative time is what matters.

Even looking at relative time, though, we probably don't care about how much time was spent updating and drawing the title screen. We want to know how much time was spent moving our sprites and performing collision detection while playing the game. Fortunately, Space Invasion is set up so that each screen is handled by a different class. I highly recommend you do this in your games too. I've seen some games where the Game's Update method is a huge switch statement, or, even worse, a set of if/then/else statements that determine which game screen is currently displayed. The whole point of object oriented programming is to allow the creation of specialized classes that handle specific functions. Providing a separate class to handle each screen's Update and Draw methods will make your programs easier to understand, easier to debug, and, as we're about to see, easier to profile.

Since Update is (not surprisingly) where we are spending the majority of our time, we'll drill down into it to see how it's spending that time. Expanding the SpaceInvasion.PlayScreen.UpdateGame method, we see the following:

Perhaps not surprisingly, when playing the game, the majority of the time is spent in our inefficient collision detection code. But, as we can see, the percentage of time spent in any of the Space Invasion methods is insignificant compared to the time spent in Microsoft.Xna.Framework.Game.Tick as a whole. Not exactly a shock since the game clearly runs quite smoothly, with no animation or control glitches. But it may be surprising to see just how little time is spent even in the collision detection code, which we knew in advance was the most inefficient algorithm possible. I was certainly surprised.

The important thing here is to see how to use this data to find where your performance problems are when you do have them. If the game was running slowly, knowing that over 75% of SpaceInvasion.PlayScreen.UpdateGame's time was being spent in the HandleCollisions method would tell us immediately where to focus our optimization time. (For those wondering where I got 75%, it's the amount of time spent in HandleCollisions as a percentage of the amount of time spent in UpdateGame.)

What it wouldn't tell us, however, is how best to optimize that code. This is where going back to the lessons in part 2 of this series is helpful. You might be able to optimize the code by making some simple implementation optimizations, such as passing large value objects by reference. As long as such changes are localized to small sections of code, and documented with good comments so you can remember later why you wrote the code that way, this is perfectly acceptable.

However, if you are way out of the bag and need significant code changes that may lead to buggy and hard-to-maintain code, you are better off going back to the drawing board and implementing a completely new algorithm. The only way to know for sure is to go back to prototyping and benchmarking some alternatives, in a small test program that doesn't require changing your main game code until you know what those changes need to be.

Other Potential Performance Bottlenecks

Execution profilers are great for helping find CPU bottlenecks, but there are other potential sources of performance problems. In particular, inefficient use of the GPU and/or memory can cause problems as well. If you are having performance issues but profiling shows that you are not significant amounts of time in your Update and Draw methods, then in all likelihood the GPU or memory is to blame.

GPU issues are caused either by making too many draw calls to the GPU or by sending it too much data each frame. Modern 3D graphics cards are pipelined so they are most efficient when they can perform many operations on a single source of data (i.e., a single texture). Therefore, minimizing the number of times textures are transferred to the GPU and minimizing the number of actual GPU draw calls go a long ways towards making the most efficient use of the GPU. The best tool for investigating GPU issues is PIX, part of the DirectX SDK from Microsoft.

Memory issues are mostly caused by not understanding how the CLR allocates memory and performs garbage collection. (Memory issues can also be caused by simply having too much data in memory at a given time, but most computers have at least 512MB if not a full gigabyte of memory these days, so that's less of an issue than it once was. Of course, that doesn't mean that you should feel free to load 500+MB of game data and expect to be able to keep it all in memory at one time...)

The important thing to keep in mind is that in C#, reference types, meaning all actual classes, are dynamically allocated on the heap. They will remain there until they are no longer referenced by any other object, at which point they become eligible for removal by the garbage collector. Value types, however, which include all of the basic types (int, float, bool, etc.) as well as structs, go on the stack and are removed when they go out of scope. Therefore, they never cause garbage collection issues (unless they are boxed). A key problem for new C# programmers, especially those that come from a C or C++, background, is that the new keyword in C# does not mean "allocate space for a new object on the heap", the way it does in those other languages. It just means "call this object's constructor". So, for example, the statement

Rectangle rect = new Rectangle(0, 0, 100, 200);

does not allocate an object on the heap, because Rectangle is a struct and, therefore, a value type. Variables of type Rectangle will never cause garbage collection issues, nor will any other struct types (again, unless they become boxed), despite the fact that to a C++ programmer the above syntax looks like you just dynamically allocated memory.

Another, even worse, common garbage collection confusion is that certain language constructs can create objects on the heap under the covers. This happens when value types become boxed, which happens whenever you try to use a value type where a reference type is expected. In these cases, the value type has to be "boxed" inside a reference type.

It also happens when the foreach statement creates an iterator for a reference type. This is the reason the foreach statement gets such bad press, because it can create iterators on the heap that will need to be garbage collected without it being obvious from the code that it is doing so (at least, to a programmer unaware of this issue). Using foreach with value types causes no such problems.

Memory and garbage collection issues are best discovered through the use of the CLR Profiler and the Remote Performance Monitor for the Xbox 360. The .NET Compact Framework team put together a great article on garbage collection in the Compact Framework and how to use the Remote Performance Monitor. Eli Tayrien's Cornflower Blue blog has two good articles on using the CLR Profiler and the Remote Performance Monitor tool to dispel the myth that foreach is evil.

The Bottom Line

Part 2 of this series showed how to use benchmarking to find out how long it takes to run a particular section of code. This article showed how to use a profiler to determine what percentage of time is being spent in each method during a full run of a complete program. By combining those two techniques, you can get a good handle on what algorithms are most appropriate while you are designing your game, and what sections of your code (if any) should be considered for optimization once you have written the game.

Hopefully what you have taken away from this series is 1) understanding the importance of optimizing your design before optimizing your implementation; 2) understanding the different types of performance problems that videogames can run into; and 3) understanding how to use various techniques to help you find your performance problems so you spend your time solving the right problem.

What Next

I'm trying to decide where to take this blog from here. I could continue focusing on software engineering as it applies to games. Or I could start taking in-depth looks at particular game algorithms and how to implement them for XNA — for example, I could show how I used the minimax algorithm with alpha-beta pruning for finding the best move in my Othello game, and how to implement it in XNA; or show how to implement the A* algorithm for pathfinding. Or I could start showcasing my Dream-Build-Play entry and how various parts are implemented. If you have any thoughts or preferences, please leave a comment.