Friday, March 23, 2007

Software Efficiency and Optimization (Part 1)

Software efficiency and optimization is a frequently recurring theme in the XNA Forums. It invariably involves questions about the C# foreach construct, garbage collection, and whether value types should be passed by reference.

While those are all valid concerns, and anyone developing XNA games needs to be aware of these issues, it's clear from many of the posts that some XNA developers were blowing them out of proportion. Many posts have claimed that anyone developing XNA games should never use foreach, as well as suggestions to always pass Matrix structures by reference. This led XNA Framework developer Shawn Hargreaves to write a wonderful post about why it is more important to write clean, maintainable code than to worry overly much about performance, especially early in the development lifecycle.

As a professional programmer who has written software for NASA, commercial communication satellites, and the defense industry, I agree with Shawn almost completely. Two favorite sayings that were drilled into my head early in my career are:

Premature optimization is the root of all evil.


The first rule of program optimization: don't do it. The second rule: don't do it yet.

Yet the fact is, game software does have to be efficient. Games are a class of realtime software, meaning that it is not enough for them to produce the correct result, they must also complete within a fixed time window. In general, game developers shoot for a minimum of displaying 30 frames per second in order to produce smooth, glitch-free animations; and most prefer 60 frames per second. That means that all of the game calculations — getting player input, implementing enemy AI, moving objects, collision detection and handling, and drawing each frame — must be completed within 16.7 milliseconds! When you consider that most modern videogames have hundreds, or even thousands, of objects that have to be updated and drawn within that time period, it's no wonder that programmers feel they have to optimize every line of code.

However, from reading the many posts with questions about "the fastest way" to implement something, it is obvious that many XNA programmers aren't familiar with the tools and methods for determining when, where, how, or even if, they should optimize their code. The point of this article is to help you answer those very questions.

Design Versus Implementation

A common response by those who question, or even outright disagree, with the idea that optimizing code early is a bad idea, is to point out that it is far easier to change software early in its lifecycle than after it has been written. That is, of course, very true. But that is why it is important to understand the difference between design optimization and implementation optimization.

While designing a game (or any software), you must take into account the size and complexity of your game, and select the correct data structures and algorithms that can support it. A simple 2D shooter or platformer with no more than a hundred or so objects interacting at any given time can probably get away with a brute force approach for handling movement and collisions. Maintaining a simple list or array of objects and iterating through it each frame will most likely work fine, and will be very simple to implement and debug.

However, a more complex game world, with perhaps thousands of active objects, will need an efficient method of partitioning the game space to minimize the number of object interaction tests each frame. Similarly, games requiring detailed enemy AI will need to rely on algorithms that can produce "intelligent" actions as quickly as possible.

There are a large number of available resources that discuss game programming algorithms, including the use of quadtrees and octrees for partitioning the game world to minimize collision detection tests; the minimax algorithm with alpha-beta pruning for efficiently finding the "best" move in two player strategy games; and the A* algorithm for efficient pathfinding.

Discussion of the specifics of these algorithms is outside the scope of this article. The important thing to take from this is:

The selection of the appropriate data structures and algorithms during the design phase has a far greater impact on the eventual performance of your game than any implementation optimization you will make.

Why? Because your algorithms determine the maximum number of operations your game will have have to perform during each frame.

To demonstrate this point, imagine that for your first game you write a simple 2D shooter that relies on a brute force approach to collision detection. Every frame, you simply test every active object against every other active object to see if they intersect. Because you decide to have only a limited number of enemies active at a time, it works well and easily runs at 60 frames per second.

With that experience under your belt, you now want to write a second, far more ambitious game. This time you decide to write a Zelda-like adventure game with a large scrolling game board and hundreds of objects moving around it simultaneously. Using your existing code as a starting point, you get well into the game's implementation before you discover that the brute force approach that worked very well in your simple game doesn't work so well in this new game. In fact, you may be measuring screen draws in seconds per frame instead of frames per second!

The reason is that comparing every object against every other object is what is known as an O(n2) algorithm. That is, the number of operations that have to be performed is related to the square of the number of objects you are operating on. If you have ten objects in your game, you only have to perform a hundred tests to see if there are any collisions. If you have a hundred objects, you have to perform ten thousand tests. Which may still be possible on a modern PC if each test can be done quickly enough. But if you have five hundred — just five times as many as the last example — you will have to perform 250,000 collision tests. Even if each test took only 67 microseconds, you would stll be using the entire 16.7 milliseconds frame time (at 60 frames per second) just for collision detection. The point is, it doesn't matter how efficiently you implement that algorithm in code, its performance will still devolve exponentially with the number of objects in your game and will therefore be the single greatest limiting factor to the size of your game.

Big O Notation

Big O notation is a mathematical method of comparing algorithmic efficiency. It is sometimes derided by programmers because it says nothing about the actual time a particular implementation of an algorithm will take. However, it can be quite useful if applied appropriately to help compare algorithms at design time.

The following table shows the relative efficiency of the most common types of algorithms, in decreasing order of efficiency. Thus, an O(log n) algorithm is considered more efficient than an O(n) algorithm, which is more efficient than an O(n2) algorithm.

Mathematical Property
The number of operations are always the same no matter how many objects are being operated on.
O(log n)
The number of operations increases proportional to the logarithm of the number of objects being operated on.
The number of operations increases proportional to the number of objects being operated on.
The number of operations increases proportional to the square of the number of objects being operated on.
The number of operations increases proportional to the factorial of the number of objects being operated on.

Remember that the point here is to determine how well the algorithms scale to handle more objects. An O(n2) algorithm where each actual operation is performed very quickly, may be faster than an O(log n) algorithm for many values of n. However, the O(n2) algorithm will always have a lower limit to the maximum number of objects it can handle. The trick is to choose an algorithm that is easy to implement while being efficient enough for the number of objects you want to handle in your game.

For discussions of videogame related algorithms, I highly recommend the Game Programming Gems series. Various websites, including Gamasutra and also have many good articles on algorithms.

It's also important to understand the relative performance of the C# generic collection classes. MSDN has a good article on how to select the appropriate Collection class for your needs, as well as an article on when to use Generic Collections. (My answer to that question is to always prefer Generic Collections to their non-generic counterparts. They are just as easy to use, are almost always more efficient, and have less chance of impacting the garbage collector.)

What Next?

That's an awful lot of background and theory, and I haven't even started to talk about optimizing implementation performance. Of course, that's the point I am trying to make — understanding design tradeoffs in order to select the design that is most appropriate for your game is the single most important task you will face in writing an efficient game.

However, it doesn't end there, and you may well end up needing to optimize certain sections of your code as you go along. Part 2 of this series will look at the importance of prototyping various aspects of your game, as well as showing you tools for profiling and benchmarking your code in order to determine which, if any, sections need to be improved.


Ultrahead said...

Nice read.

Zygote said...

Nice to see a new face around :) Keep up the good work.

Ziggyware XNA News and Tutorials

George Clingerman said...

Nicely done. Looking forward to your continued participation in the community.

Thanks for stepping up and tackling this issue, much appreciated!

X-Tatic said...

Excellent entry. I hope you end up posting these regularly :)

Michael Coles
Digini, Inc

Darkside said...

Fantastic Work, hope to see more soon.
Glad to see you in the community!