Wednesday, April 18, 2007

Software Efficiency And Optimization (Part 3)

In the first two parts of this series, I discussed the difference between optimizing a design versus optimizing implementation, and showed how to use prototyping to benchmark key algorithms. This part will look at how to use a profiler to find performance bottlenecks.

The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.

So when do you do it? Only when you know you have a problem, and only when you know what the problem is. Profilers help you find these problems.

Argghhh! Why is my game so slow?

So you've finally gotten your game to the point that it is basically playable with most of the features you want implemented, and now you are hoping to get some friends to help you start beta testing it. But when you run it, the animation is incredibly jerky. Because you had been bragging to your friends about the game and want them to see it as soon as possible, you start going through the code in your mind, frantically thinking about what could be causing the slowdowns.

"I've heard foreach is very slow," you think, so you quickly do a search through your files and replace every one of your foreach statements with equivalent for statements, kicking yourself for not having done this in the first place. After fifteen minutes or so, and several recompiles because you made some mistake or another while translating the code — "Whoops, I declared my index variable to be 'index', but I called it 'i' when I tried to increment it." ... a couple minutes later ... "Dammit, that array only has ten elements but I told it to loop through eleven." — you finally get it running again... only to be dismayed to discover that it made no difference.

"Hmmm... I have an awful lot of matrices I'm passing around by value. Copying all of that data at every function call must take an awful lot of time." So you go through all of the code again looking for every method that takes a matrix and changing them to pass by reference. After many, many more recompiles than were needed during the change from for to foreach"Oh, come on, you mean I have to add the ref keyword both when I declare the method and when I call it?!?" — you finally get it back to the point where you can run it again. Unfortunately, now it doesn't work at all. You run it and objects start jumping around the screen for no apparent reason. Then you remember that some of your methods took advantage of the fact that a matrix was being passed by value and changed its value locally. Now that it is being passed by reference, that change is being made in the original object, which is not at all what you intended.

If you are lucky (or smart), you stop there and take a step back. If you're not lucky (or you're just foolish), you think that you'll be able to fix it and you start going into each method trying to figure out how to kludge your way around the problem on a case-by-case basis. Maybe you'll even succeed and get the game back to its runnable state that you had it in hours earlier. Maybe you're even lucky enough to have not introduced yet more bugs in the process. But in all likelihood, you haven't fixed the problem. And now you have code that does not run any better than when you started, but is harder to understand, harder to debug, and has kludges in it to get around problems that you introduced trying to fix the wrong problem.

Your time will be much better spent finding out where your problems are before you try to fix them.

Using NProf

Execution profilers are tools that help you find where your code is spending most of its time. There are two types of execution profilers. Event-based profilers use actual events, such as function calls, and trace the exact execution path as the program runs. Statistical profilers work by interrupting the program at frequent intervals to determine what section of code is currently running. Statistical profilers are less intrusive and therefore provide more accurate measure of how much time is spent in each function.

There are some very nice professional statistical profilers available for .NET development, including the ANTS Profiler and dotTrace. Both provide lots of great features, but both cost hundreds of dollars. Since one of the great aspects of XNA development is that Microsoft has made the Game Studio Express tools available for free, I'll show you how to keep using free tools by concentrating on the Open Source NProf profiler.

As with many (most?) Open Source tools, the main things missing from NProf are documentation and support. Once you get past that, though, it's a great tool. It doesn't provide the cool graphs and other bells-and-whistles of its $300+ commercial counterparts, but it lets us find out how much time is being spent in each of our functions. Ultimately, that's all we care about.

You can download the latest version of NProf (0.10 as of this writing) here. When you run it, you get the following (rather unhelpful) screen:

You might think that you would want to start by selecting File->New, which is the convention for Windows programs. Sadly, that does nothing. (Actually, it clears the current application so you can load a new one. But since you haven't loaded one at all yet, doing this now doesn't change the screen or give you any feedback.)

To select an application for profiling, press the Browse button. Then find a .NET executable for debugging. Note that when profiling XNA games, you will have to select a Windows executable — currently there is no mechanism for profiling Xbox 360 games.

Your XNA game is unlikely to take command line arguments, but on the off-chance that you have written it to support them, you can enter them in the Arguments text box.

Once you have selected a program to profile and entered any command line arguments, select Project->Start or press F5 to start the program. Your program will start running. Play through your game and then exit the program. When you exit, the NProf window will look something like this:

This is a sample run from my Space Invasion game. The two views shown in the top and bottom panels list every method that was executed when you played the game. The top view lets you expand each method to see what methods it called. The bottom view lets you expand each method to see what methods called it. In both views, the Time column shows what percentage of the program's total execution time was spent in each method. The methods are sorted by the percentage of the program's execution time that was spent inside that method.

Note that that includes the cumulative time spent inside every method that method called. So any time you run NProf on an XNA game, the first method listed will be System.Threading.ExecutionContext.Run and it will be shown as taking 100% of the time. That's because everything else in the game is ultimately called by that method, so every time NProf took an execution sample, it was in System.Threading.ExecutionContext.Run.

Looking down a bit further, you'll notice your main program (in this case, SpaceInvasion.Program.Main) and below that the XNA Framework's Run method (Microsoft.Xna.Framework.Game.Run). Continuing down even further, you'll finally find Microsoft.Xna.Framework.Game.Tick. In a fixed-step game, this will be called every TargetElapsedTime period (defaulting to 60 times per second).

If you expand that method in the Callees panel, you'll see that it calls three methods, Microsoft.Xna.Framework.Game.DrawFrame, SpaceInvasion.InvadersGame.Update, and Microsoft.Xna.Framework.GameClock.Step. The latter took only 0.01% of the total execution time, so we can safely ignore it. The SpaceInvasion.InvadersGame.Update method took only about 2% of the total execution time. And Microsoft.Xna.Framework.Game.DrawFrame took over 90% of the execution time!

Does that mean the program took over 15 milliseconds to draw each frame for a simple Space Invaders game with no more than 60 2D objects on screen at any time? Not at all. The time spent in the DrawFrame method includes the time spent waiting to synchronize with the vertical trace on the GraphicsDeviceManager.

For now, don't worry too much about how much time is spent resolving the back buffer to the GPU. (I'll talk very briefly about GPU issues at the end.) Instead, look at how much time is spent inside your program's Update and Draw methods. In this particular run of Space Invasion, these are approximately 3.4% for SpaceInvasion.InvadersGame.Update and 1.6% for SpaceInvasion.InvadersGame.Draw.

Since we know that Space Invasion is running at the default fixed-step time period of 60 frames per second, then the amount of time spent in Microsoft.Xna.Framework.Game.Tick each frame is 16.7 milliseconds. So we might be tempted to try to calculate the amount of time spent in each method by multiplying the percentage of time spent in each by 16.7.

However, this isn't accurate because, as we noted above, only 93.3% of the program's execution time was spent in the Microsoft.Xna.Framework.Game.Tick method. And, of course, I've already mentioned that even statistical profilers like NProf add some amount of overhead and error. While you could get out your calculator and start dividing the time spent in the Update and Draw methods by 0.933, there is yet another catch. The time shown is the total time for the entire program, including the time spent on the title screen when we weren't actually playing the game.

Of course, there are still ways around this if you are determined to get exact numbers for the average time spent in each method for each frame. But it's really not worth it. Don't make the mistake of thinking you can use a profiler to find out exactly how much time each method takes, or even that it's helpful to do so. Relative time is what matters.

Even looking at relative time, though, we probably don't care about how much time was spent updating and drawing the title screen. We want to know how much time was spent moving our sprites and performing collision detection while playing the game. Fortunately, Space Invasion is set up so that each screen is handled by a different class. I highly recommend you do this in your games too. I've seen some games where the Game's Update method is a huge switch statement, or, even worse, a set of if/then/else statements that determine which game screen is currently displayed. The whole point of object oriented programming is to allow the creation of specialized classes that handle specific functions. Providing a separate class to handle each screen's Update and Draw methods will make your programs easier to understand, easier to debug, and, as we're about to see, easier to profile.

Since Update is (not surprisingly) where we are spending the majority of our time, we'll drill down into it to see how it's spending that time. Expanding the SpaceInvasion.PlayScreen.UpdateGame method, we see the following:

Perhaps not surprisingly, when playing the game, the majority of the time is spent in our inefficient collision detection code. But, as we can see, the percentage of time spent in any of the Space Invasion methods is insignificant compared to the time spent in Microsoft.Xna.Framework.Game.Tick as a whole. Not exactly a shock since the game clearly runs quite smoothly, with no animation or control glitches. But it may be surprising to see just how little time is spent even in the collision detection code, which we knew in advance was the most inefficient algorithm possible. I was certainly surprised.

The important thing here is to see how to use this data to find where your performance problems are when you do have them. If the game was running slowly, knowing that over 75% of SpaceInvasion.PlayScreen.UpdateGame's time was being spent in the HandleCollisions method would tell us immediately where to focus our optimization time. (For those wondering where I got 75%, it's the amount of time spent in HandleCollisions as a percentage of the amount of time spent in UpdateGame.)

What it wouldn't tell us, however, is how best to optimize that code. This is where going back to the lessons in part 2 of this series is helpful. You might be able to optimize the code by making some simple implementation optimizations, such as passing large value objects by reference. As long as such changes are localized to small sections of code, and documented with good comments so you can remember later why you wrote the code that way, this is perfectly acceptable.

However, if you are way out of the bag and need significant code changes that may lead to buggy and hard-to-maintain code, you are better off going back to the drawing board and implementing a completely new algorithm. The only way to know for sure is to go back to prototyping and benchmarking some alternatives, in a small test program that doesn't require changing your main game code until you know what those changes need to be.

Other Potential Performance Bottlenecks

Execution profilers are great for helping find CPU bottlenecks, but there are other potential sources of performance problems. In particular, inefficient use of the GPU and/or memory can cause problems as well. If you are having performance issues but profiling shows that you are not significant amounts of time in your Update and Draw methods, then in all likelihood the GPU or memory is to blame.

GPU issues are caused either by making too many draw calls to the GPU or by sending it too much data each frame. Modern 3D graphics cards are pipelined so they are most efficient when they can perform many operations on a single source of data (i.e., a single texture). Therefore, minimizing the number of times textures are transferred to the GPU and minimizing the number of actual GPU draw calls go a long ways towards making the most efficient use of the GPU. The best tool for investigating GPU issues is PIX, part of the DirectX SDK from Microsoft.

Memory issues are mostly caused by not understanding how the CLR allocates memory and performs garbage collection. (Memory issues can also be caused by simply having too much data in memory at a given time, but most computers have at least 512MB if not a full gigabyte of memory these days, so that's less of an issue than it once was. Of course, that doesn't mean that you should feel free to load 500+MB of game data and expect to be able to keep it all in memory at one time...)

The important thing to keep in mind is that in C#, reference types, meaning all actual classes, are dynamically allocated on the heap. They will remain there until they are no longer referenced by any other object, at which point they become eligible for removal by the garbage collector. Value types, however, which include all of the basic types (int, float, bool, etc.) as well as structs, go on the stack and are removed when they go out of scope. Therefore, they never cause garbage collection issues (unless they are boxed). A key problem for new C# programmers, especially those that come from a C or C++, background, is that the new keyword in C# does not mean "allocate space for a new object on the heap", the way it does in those other languages. It just means "call this object's constructor". So, for example, the statement

Rectangle rect = new Rectangle(0, 0, 100, 200);

does not allocate an object on the heap, because Rectangle is a struct and, therefore, a value type. Variables of type Rectangle will never cause garbage collection issues, nor will any other struct types (again, unless they become boxed), despite the fact that to a C++ programmer the above syntax looks like you just dynamically allocated memory.

Another, even worse, common garbage collection confusion is that certain language constructs can create objects on the heap under the covers. This happens when value types become boxed, which happens whenever you try to use a value type where a reference type is expected. In these cases, the value type has to be "boxed" inside a reference type.

It also happens when the foreach statement creates an iterator for a reference type. This is the reason the foreach statement gets such bad press, because it can create iterators on the heap that will need to be garbage collected without it being obvious from the code that it is doing so (at least, to a programmer unaware of this issue). Using foreach with value types causes no such problems.

Memory and garbage collection issues are best discovered through the use of the CLR Profiler and the Remote Performance Monitor for the Xbox 360. The .NET Compact Framework team put together a great article on garbage collection in the Compact Framework and how to use the Remote Performance Monitor. Eli Tayrien's Cornflower Blue blog has two good articles on using the CLR Profiler and the Remote Performance Monitor tool to dispel the myth that foreach is evil.

The Bottom Line

Part 2 of this series showed how to use benchmarking to find out how long it takes to run a particular section of code. This article showed how to use a profiler to determine what percentage of time is being spent in each method during a full run of a complete program. By combining those two techniques, you can get a good handle on what algorithms are most appropriate while you are designing your game, and what sections of your code (if any) should be considered for optimization once you have written the game.

Hopefully what you have taken away from this series is 1) understanding the importance of optimizing your design before optimizing your implementation; 2) understanding the different types of performance problems that videogames can run into; and 3) understanding how to use various techniques to help you find your performance problems so you spend your time solving the right problem.

What Next

I'm trying to decide where to take this blog from here. I could continue focusing on software engineering as it applies to games. Or I could start taking in-depth looks at particular game algorithms and how to implement them for XNA — for example, I could show how I used the minimax algorithm with alpha-beta pruning for finding the best move in my Othello game, and how to implement it in XNA; or show how to implement the A* algorithm for pathfinding. Or I could start showcasing my Dream-Build-Play entry and how various parts are implemented. If you have any thoughts or preferences, please leave a comment.


Anonymous said...

Enjoying the series so far! An A* implementation would be an interesting read, I think.

ajmiles said...

I'd quite like to see your DBP entry.

Anonymous said...

A very enlightening series, thanks! I'd like to see an article on A* next as well.

Anonymous said...

id like to see A* implementation as well

Anonymous said...

very useful article, didnt know that about the new keyword in C#, hit the nail on the head about what you said about c++ programmers :P