Sunday, March 25, 2007

Software Efficiency And Optimization (Part 2)

In Part 1 of this series, I discussed the importance of understanding game design tradeoffs and introduced the concept of Big O notation for comparing the efficiency of different algorithms. However, as I said, Big O notation tells us nothing about how fast an algorithm will actually run, it just tells us how efficiently it scales when operating on greater numbers of objects.

"That's great", I hear you saying, "but how do I know when I'm doing my design whether a particular algorithm will meet my needs?"

Ah, well, there is really only one way — experience.

"Gee, thanks,", you reply. "How am I supposed to get the experience I need to design my game, if you say I shouldn't write it until after I've designed it?!?"

That's actually pretty simple. You need to prototype your approach and see how well it meets your requirements. If it doesn't meet your requirements, you need to either prototype different approaches until you find one that does, or scale back your requirements.

Prototyping and Benchmarking

A prototype is just a simplified version of software (in this case, your game), that focuses on one particular aspect of it. Prototypes are often used as proofs of concept to show that the software will be able to work as expected. Because prototypes don't have to deal with all of the details that the final software will, they can be written quickly so that, if necessary, different approaches can be evaluated.

Prototypes are frequently used to evaluate user interfaces so that customers can provide early feedback. That can be useful for game programming too, since if you can implement a working display and control scheme you may be able to find out what works and doesn't work before you get too far along in the actual implementation of the game. However, the use of prototypes that we are concerned with here is to determine whether an algorithm is fast enough for the game we want to write. To do that, we will want to benchmark it. Benchmarking is just the process of timing how long an algorithm takes to run.

Fortunately, the .NET Framework makes benchmarking very easy by providing the System.Debug.Stopwatch class.The Stopwatch class provides a Start and a Stop method. It keeps track of the total number of clock ticks that occur between calls to Start and Stop. Even better, like a real stopwatch, it keeps a running count of ticks between successive calls to Start and Stop. You can find out how much time has passed by querying its ElapsedTicks or ElapsedMilliseconds properties. A Reset method lets us reset the Stopwatch back to zero.

Lets put this to use in a simple example that merely tests how long it takes to move a sprite. For this, I'll use a simplified version of the Sprite class I wrote for my Space Invasion game. A quick overview of the class is provided below. You can download the full code for this benchmark example here.

public abstract class Sprite { public Vector2 Position { get; set; } public Color Color { get; set; } /// /// Sprite's collision rectangle in screen coordinates. /// public BoundingRectangle BoundingBox { get; } public Sprite( string imageName, BoundingRectangle boundingBox); public virtual void Initialize(); public virtual void LoadGraphicsContent( ContentManager content); public virtual void Update(GameTime time); public virtual void Draw(SpriteBatch spriteBatch); /// /// Tests for collision with another Sprite. If a /// collision occurs, it calls the Collide method for /// both Sprites. /// Returns true if images collide. /// public bool TestCollision(Sprite item); /// /// Called when the TestCollision method detects a /// collision with another Sprite. /// protected virtual void Collide( BoundingRectangle overlap, Sprite item); }

As you can see, it provides methods corresponding to all of the usual game engine methods — Initialize, LoadGraphicsContent, Update, and Draw. Additionally, it provides properties for getting and setting the position and color. For collision detection, it provides a TestCollision method that tests for collision with another Sprite and a virtual method, Collide, that gets called by TestCollision for both sprites if their BoundingBox values intersect.

Since it is abstract, it is intended to be used as a parent to other Sprite classes that will implement its behavior, so we will create our own TestSprite class. The TestSprite will generate a random starting position, directional movement vector, and speed (in pixels per second), as shown here:

public override void Initialize() { // Set starting position. Position = new Vector2( random.Next(screenWidth), random.Next(screenHeight)); // Create a random movement vector. direction.X = (float)random.NextDouble() * 2 - 1; direction.Y = (float)random.NextDouble() * 2 - 1; direction.Normalize(); // Determine random speed in pixels per second. speed = (float)random.NextDouble() * 300 + 150; }

Each frame, it will update its position based on its movement direction, speed, and the amount of time that has elapsed. It will also test to see if it has hit the edge of the screen, and deflect away from it:

public override void Update(GameTime time) { // Reset color back to white. Color = Microsoft.Xna.Framework.Graphics.Color.White; // Calculate movement vector. Vector2 move = (float)time.ElapsedGameTime.TotalSeconds * speed * direction; // Determine new position. UpdatePosition(move); } private void UpdatePosition(Vector2 move) { Position += move; if ((BoundingBox.Left < 0) || (BoundingBox.Right > screenWidth)) { direction.X = -direction.X; Position -= new Vector2(move.X, 0); } if ((BoundingBox.Top < 0) || (BoundingBox.Bottom > screenHeight)) { direction.Y = -direction.Y; Position -= new Vector2(0, move.Y); } }

Note that the screen collision test shown here is quite simple. An actual game may want to determine the actual point of intersection so it could deflect away from that point more realistically. If you did need that level of realism, you would probably want to go ahead and implement your strategy here so you could time it. However, all we are trying to prototype here is basic update time so this version is fine for our needs.

Note that the Update method does not test for collisions. We don't want individual sprites testing for collisions because to do so, our Sprite class would have to know about other game objects and we would be severely limiting our design options for collision testing. Any changes to our collision testing algorithm could, and likely would, impact our Sprite class. We want to avoid anything that limits future design changes, so we'll give our Sprite class the ability to test for collisions but require another part of our code to determine what objects should be tested.

We'll talk more about collision testing next. For now, let's see what it takes to time just moving our TestSprite around the screen. Inside our game, we'll create a TestSprite object and call its Initialize and LoadGraphicsContent methods at the appropriate places. And we'll create a SpriteBatch for our game and pass it to our TestSprite's Draw method inside our own Draw method. Now all we need is for our Update method to call our TestSprite's Update method each frame and use a Stopwatch to time it. To do that, we'll create a couple of helper methods that start and stop the Stopwatch, and have it print out the amount of time it takes for each update:

private Stopwatch updateTimer; private int updates = 0; private int framesPerSecond; private void StartTimer() { updateTimer.Start(); } private void StopTimer() { updateTimer.Stop(); updates++; // Show the results every five seconds. if (updates == 5 * framesPerSecond) { Debug.WriteLine( updates + " updates took " + updateTimer.ElapsedTicks + " ticks (" + updateTimer.ElapsedMilliseconds + " milliseconds)."); int msPerUpdate = (int)updateTimer.ElapsedMilliseconds / updates; Debug.WriteLine( "Each update took " + msPerUpdate + " milliseconds."); // Reset stopwatch. updates = 0; updateTimer.Reset(); } }

By putting calls to StartTimer and StopTimer around the calls to our sprite's Update method, we'll get a report of the average time each call takes.

300 updates took 34931 ticks (9 milliseconds). Each update took 0.03 milliseconds. 300 updates took 24445 ticks (6 milliseconds). Each update took 0.02 milliseconds. 300 updates took 23541 ticks (6 milliseconds). Each update took 0.02 milliseconds. 300 updates took 23583 ticks (6 milliseconds). Each update took 0.02 milliseconds. 300 updates took 23963 ticks (6 milliseconds). Each update took 0.02 milliseconds.

Not bad, each call took on average of 20 microseconds (on my development laptop — your results will vary). But notice that the very first set of updates took almost one and half times as long to run as the others. That's because the first time these methods are called, the JIT compiler is compiling the code and our Stopwatch is timing that as well. It's also possible, since this is a fairly small amount of code that is being called repeatedly, that some or all of it may be fitting in the cache, which will increase the speed of later calls.

These show some of the problems with benchmarking code. Another problem is that we are adding some time by using the Stopwatch itself. Thus benchmark times for prototype code can be used as a general guide, but can not be relied upon for exact values. In fact, exact values of the time it takes for functions to run are very hard to determine. Although intended only to describe quantum phenomenon, a variation of the Heisenberg Uncertainty Principle is at play here — the act of measuring something affects the thing being measured. We'll explore this more, and ways to work around it, in Part 3.

Benchmarking Collision Handling

Now let's expand our prototype to help us determine whether we can get away with a brute-force approach to collision detection. You can download the updated project for this section here.

First, let's look at the collision handling code that I already placed in TestSprite's Collide method. Remember that this gets called, for both sprites, whenever TestCollision determines that a collision between two sprites occurred. All it does is set the sprite's color to red.

protected override void Collide( BoundingRectangle overlap, Sprite item) { // Turn the sprite red to indicate collision. Color = Microsoft.Xna.Framework.Graphics.Color.Red; }

Let's give this a test by replacing our single TestSprite with an array of TestSprites. Every place we referenced TestSprite in the original code, we now have to loop through the array to handle all of our TestSprites. To make this a little easier to manage, we'll refactor our original sprite.Update() call in Update into a new UpdateSprites method that updates every sprite. And we'll add a new HandleCollisions method to our game to test for collisions. Finally, we'll change Update so that it only calls StartTimer and StopTimer around the call to HandleCollisions. The relevant sections look like this:

private TestSprite[] sprites = new TestSprite[10]; protected override void Update(GameTime gameTime) { if (Keyboard.GetState().IsKeyDown(Keys.Escape)) { this.Exit(); } UpdateSprites(gameTime); StartTimer(); HandleCollisions(); StopTimer(); base.Update(gameTime); } private void UpdateSprites(GameTime gameTime) { foreach (Sprite sprite in sprites) { sprite.Update(gameTime); } } private void HandleCollisions() { for (int i = 0; i < sprites.Length; i++) { for (int j = i + 1; j < sprites.Length; j++) { sprites[i].TestCollision(sprites[j]); } } }

Looking at that, you may wonder why I'm not using foreach for the HandleCollisions call. It's simply because with foreach we have no way of knowing what sprites we already tested. This algorithm tests every sprite against every other sprite exactly once.

So, what are the results? On my machine, with 10 sprites, I get the following:

300 updates took 48827 ticks (13 milliseconds). Each update took 0.04333333 milliseconds. 300 updates took 42466 ticks (11 milliseconds). Each update took 0.03666667 milliseconds. 300 updates took 42371 ticks (11 milliseconds). Each update took 0.03666667 milliseconds. 300 updates took 43086 ticks (12 milliseconds). Each update took 0.04 milliseconds. 300 updates took 43449 ticks (12 milliseconds). Each update took 0.04 milliseconds.

Wow. Handling collisions for ten sprites takes only twice as long as it did just to move one sprite. How could that be? It's partly due to the overhead of using the Stopwatch class and making function calls, and partly due to the fact that we are measuring very fast operations. Obviously, the closer you get to the resolution of the underlying timer, the more error you get in trying to time things.

We'll talk more about timer resolution later. Before we go on, notice also that the impact of the JIT compiler during our first set of updates is signicantly less, comparatively, than it was in our last example. This shows how effective JIT compilation is and why we don't need to worry about it impacting the performance of our game. We may take a performance hit the first time a section of code is run, but it's relatively miniscule to our overall performance.

Now let's see what happens when we increase the number of sprites to 100:

300 updates took 2079460 ticks (580 milliseconds). Each update took 1.933333 milliseconds. 300 updates took 2156954 ticks (602 milliseconds). Each update took 2.006667 milliseconds. 300 updates took 2138909 ticks (597 milliseconds). Each update took 1.99 milliseconds. 300 updates took 2150696 ticks (600 milliseconds). Each update took 2 milliseconds. 300 updates took 2169919 ticks (606 milliseconds). Each update took 2.02 milliseconds.

Whether you should be impressed or dismayed depends on how you want to use this collision handling algorithm. On the one hand, averaging 2 milliseconds per frame is still a miniscule part of our 16.7 millisecond frame time. If you are not planning on having more than a hundred sprites or so, this algorithm will suit your needs perfectly. But looking at the relative time difference per sprite gives a completely different perspective. It takes us 50 times as long to handle 10 times the number of sprites.

Hmmmm... what happens when we increase to 500? I urge you to run this code so you can see the results for yourself!

300 updates took 28266113 ticks (7896 milliseconds). Each update took 26.32 milliseconds. 300 updates took 28179606 ticks (7872 milliseconds). Each update took 26.24 milliseconds. 300 updates took 28291296 ticks (7903 milliseconds). Each update took 26.34333 milliseconds. 300 updates took 28199114 ticks (7877 milliseconds). Each update took 26.25667 milliseconds. 300 updates took 28182787 ticks (7873 milliseconds). Each update took 26.24333 milliseconds.

Another "Wow!", and this time there is no way to hide the dismay. The movement is noticeably jerky and we're clearly getting far less than our desired 60 frames per second! In fact, just the HandleCollisions call alone is taking almost twice our allotted 16.7 milliseconds per frame. Multiplying the number of objects by 5 increased our time by 13!

Of course, as I mentioned in Part 1, this is what we would expect from an O(n2) algorithm. The times aren't increasing exactly quadratically, due to overhead, but the rate of increase is clear.

Lessons

Does this mean we should never consider this algorithm? Hopefully at this point the answer is obvious. Many games can easily get away with only having on the order of a hundred or so objects active at a time, which we have clearly shown can be handled easily. The fact that the algorithm is trivial to implement and maintain makes it a no-brainer for a large number of games.

On the other hand, if you know you will need to have hundreds of objects, you'll need another solution. You have two options — optimize this algorithm or find a new one. Anyone who is experienced with code optimization will see several obvious ways to make both the algorithm and its implementation more efficient.

For starters, most games don't actually need to test every object against every other object. Taking my Space Invasion game as an example, I don't need to test Invaders for collision with other Invaders. In fact, it's almost crazy to do so. (But I do anyway!)

Another obvious optimization is that the Sprite class's BoundingBox property is adding the Sprite's current screen position to its internal BoundingRectangle every time TestCollision is called. This despite the fact that the Position changes only once or twice per frame. TestCollision, on the other hand, is called once for every other Sprite in the game. (Note that I did optimize this in my actual Space Invasion game by updating the BoundingBox in the Position property's setter. As we can see from these numbers, however, I would have gotten equally satisfactory results if I had used the less efficient approach shown here.)

In addition, the Sprite's TestCollision code is computing the actual intersection rectangle even though we're not using it here. We could easily save some time by not computing it. But we give ourselves more flexibility by going ahead and doing it. Remember that this is supposed to be a generic Sprite class that can be used for many games.

These suggestions don't even get into implementation optimizations, such as always passing our BoundingBoxes by reference instead of value; and providing direct access to member variables instead of accessing them through Properties. These are exactly the types of optimizations suggested by many efficiency proponents in the XNA forums. But these also make the code less readable, harder to debug and harder to maintain.

Since Space Invasion never has more than around 60 objects on screen at a time, the unoptimized brute force approach works just fine. And that's undoubtedly true for many other games as well. (In fact, as we'll see in Part 3, the entire Update loop for Space Invasion averages less than a millisecond per frame!) But what if your game does need more than 100 collideable objects? Shouldn't you make those optimizations so you can handle them?

The answer is... maybe. By making some of these optimizations, we can get this same brute force algorithm to handle 500 objects at a far more reasonable 6.4 milliseconds per frame. You can download the optimized project here

300 updates took 6682522 ticks (1866 milliseconds). Each update took 6.22 milliseconds. 300 updates took 7038462 ticks (1966 milliseconds). Each update took 6.553333 milliseconds. 300 updates took 7023610 ticks (1962 milliseconds). Each update took 6.54 milliseconds. 300 updates took 6718281 ticks (1876 milliseconds). Each update took 6.253334 milliseconds. 300 updates took 7136208 ticks (1993 milliseconds). Each update took 6.643333 milliseconds.

That's an impressive improvement and shows how signicantly performance can be optimized through these techniques. But the disadvantages mentioned above — less maintainable and less flexible code — shouldn't be ignored. And even if you do these sorts of implementation optimizations, keep in mind that this algorithm will still degrade exponentially as you add more objects. You may be able to move up from 100 to 500 objects, but it won't get you to 1000. At some point, you need to recognize that you need a different algorithm to efficiently handle more objects — such as one that partitions your game space, like quadtrees.

Finally, remember that 6.4 milliseconds is still 40% of your entire frame time. If you are maintaining on the order of a thousand or more objects at a time, other parts of your code are almost certainly also going to be difficult to manage at a reasonable frame rate. Is optimizing your collision detection the best use of your time? How do you know in advance which ones to optimize? Optimizing all of them as you go will surely take you longer to write, not to mention make your code more difficult to debug and maintain.

If benchmarking shows your algorithm has problems without implementation optimizations, you are probably better off with a different algorithm. Which isn't to say that these sorts of optimizations aren't still useful. But it's better to do these selectively, once you have your code implemented and have profiled it to find specific problem areas. We'll look at how to use a profiler to find these problem areas next week, in Part 3.

3 comments:

Ultrahead said...

This articles are great!

"Another problem is that we are adding some time by using the Stopwatch itself. Thus benchmark times for prototype code can be used as a general guide, but can not be relied upon for exact values. In fact, exact values of the time it takes for functions to run are very hard to determine."

Would be any different if the benchmark helper run in a separated thread?

"For starters, most games don't actually need to test every object against every other object."

Indeed, plus if they do, a scengraph could become handy to determine which objects should be rendered, test for collisions, do physics updates, and maybe updates in general.

thomas h aylesworth said...

Thanks for the feedback. I'm glad you're enjoying them!

It's possible you could come up with a multithreaded approach that slightly improved the overhead time of the Stopwatch. But you still need some method of communicating your Start and Stop timer events to that thread, and that will still add overhead. Not to mention be far more complex.

The best way to remove the overhead effect is to time the overhead itself (the time to Start and Stop the Stopwatch) and subtract it. It's pretty rare that you really need to know precisely how long something is taking, though.

As for your comment about scenegraphs, you're exactly right. That's exactly the sort of tradeoff that needs to be considered in the design phase.

Johnny GoTime said...

Thanks very much for the articles.
I'm just getting into collision detection on my first game, and you're giving me great things to think about.
Cheers!