Programming tips: learn optimization strategies

Published November 5, 2006. Filed under: Programming, Python.

Recently I spent a little time talking about the tradeoffs between “concise” code and readable code in Python. Throughout that entry, I was using as an example a simple function which calculates numbers in the Fibonacci sequence; here’s one variation:

def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

The Fibonacci sequence is a classic example from introductory programming materials, because it teaches recursion, and recursion is an important core concept for productive programming. But it can also be useful for another educational topic: optimization.

I mentioned optimization only briefly in that entry, pointing out that when there are lots of “special case” values to check for, it’s often faster to put them into a list or tuple and use in to see if you’re dealing with one of them. But this one only has two special-case values, and they can be handled with a single test: if n < 2. The optimization lessons the Fibonacci sequence can teach us lie elsewhere, and they start with observing that the fib function above is unbearably slow.

If you’re confused about that (the simple test I provided in the original entry only calculated the first ten numbers, which is something that can go fairly quickly), pop open a Python shell, type in the fib function, and then try this:

for i in xrange(50):
    print fib(i)

Then go get a cup of coffee or something, because that’s going to take a while. On my fast and shiny MacBook Pro, just getting the first forty numbers took over eight minutes. Yes, eight minutes. I killed it around then (remember you can stop a Python program by typing Control+C on most systems), so I don’t know exactly how long it takes to get all the way to the fiftieth one. I just know that it’s a really slow function.

Why’s it so slow?

The short answer is “recursion”, the same fundamental concept it does such a good job of introducing. But that’s not a complete answer; getting to that will take a bit of exploration. First, let’s see how many times fib has to recursively call itself to calculate different numbers:

Hm. 2, 4, 8 — those numbers sound familiar, right? They’re the first three powers of 2. Past fib(4) the correspondence isn’t exact anymore, but the number of recursive calls still grows at an extremely high rate. In formal terms, this algorithm for calculating Fibonacci numbers is O(2n).

How to talk about efficiency

The canonical language for discussing the efficiency of any given algorithm is something called “big O notation”; even if you don’t know what it means, you’ve probably seen it before — people will talk about something being O(n2) or O(log n) or similar.

Roughly speaking, this is a measure of how the time it takes the algorithm to do its work grows in relation to the size of its input, but big O notation actually doesn’t specify the exact rate of change in the algorithm’s speed. Instead, it’s based on the fact that, as the input grows ever larger, you can ignore certain parts of the exact formula which describes its speed. This is because those parts (or terms, to use the correct terminology) become less and less relevant as the input grows.

For example, if you had an algorithm which, for an input n, needed to carry out n2 + 4n separate steps, then you could safely ignore the 4n part — n2 will grow so much larger, and will grow so much more quickly, that the 4n quickly becomes irrelevant and only the n2 really matters. Thus, in big O notation, that algorithm would be described as O(n2).

Being able to estimate, even in a very general sense, the efficiency of an algorithm using big O notation is an extremely useful skill; the Wikipedia article linked above has a short table of common variations with examples, and lots of good books and articles on programming will give you more examples to learn from (for example, an algorithm which boils down to a simple loop, like for(i=0; i<n; i++), is often O(n), or linear).

Back to Fibonacci

So now we have a basic understanding of what it means to say fib is O(2n) — the amount of time it takes to calculate numbers further along in the sequence grows exponentially. And now we can look a little more closely at why that is. As I’ve already said, it has to do with the recursion going on, but the fact that the function recursively calls itself isn’t the complete answer. To see the deeper issue, let’s walk through what happens when we ask for fib(4):

  1. 4 isn’t less than 2, so the function can’t return immediately. So it will have to get the values for fib(3) and fib(2) and add them together.
  2. It calculates fib(3); 3 isn’t less than 2, so that’s going to be fib(1) (which can return a value immediately) plus fib(2) (which has to be calculated as fib(1) plus fib(0)).
  3. It calculates fib(2). 2 isn’t less than 2, so it has to be calculated as fib(1) plus fib(0).
  4. Finally it adds together the values it got for fib(3) and fib(2), and returns the result.

Notice the problem here? The function calculated fib(2) twice — once because it needed fib(2) directly, and once because it was a necessary part of calculating fib(3). This is the biggest performance bottleneck in the fib function: it has no way of “remembering” values it’s already calculated, so it has to re-calculate the same ones again and again (calculating fib(5), for example, would involve multiple calculations of both fib(3) and fib(2)), and all of those repeated calculations take time. As the numbers get larger, there are more and more duplicated calculations and so it takes more and more time.

This suggests that one easy way to optimize the Fibonacci calculation would be to provide a way to store values once they’re calculated, and re-use them whenever they’re needed.

Memoizing Fibonacci

Storing the results of a function for later re-use, rather than re-calculating every time the value is needed, is an optimization strategy called memoization (it’s not wrong to call this “caching”, but memoization is a specialized type of caching), and it can drastically speed up certain types of programs. Here’s an example of a memoized Fibonacci (for clarity’s sake, I’ve rewritten it as a class — keeping fib as a standalone function is possible, but a bit more involved):

class Fibonacci(object):
    def __init__(self):
        self._cache = {}
    def fib(self, n):
        if n < 2:
            return n
        if not self._cache.has_key(n):
            self._cache[n] = self.fib(n-1) + self.fib(n-2)
        return self._cache[n]

Each instance of the Fibonacci class has a dictionary associated with it, called _cache, which stores numbers it’s already calculated. Then when it needs one of those numbers, it just pulls it out of the dictionary instead of recalculating it. Previously, calculating the first 50 Fibonacci numbers took ages; let’s try it now:

fibs = Fibonacci()
for i in xrange(50):
    print fibs.fib(i)

You should see that it’s a lot faster; on my MacBook Pro, this test takes less than one second to run.

But we’ve made a tradeoff. We can calculate the Fibonacci numbers much more quickly, but we’re still being inefficient: calculating larger and larger Fibonacci numbers takes more and more memory as we store the already-calculated numbers for re-use. That’s the downside of memoization — it can be screamingly fast, but for larger inputs it can turn your code into a pretty big memory hog. Whether that’s the right trade to make will depend on the particular conditions your program needs to meet.

Trying a different algorithm

Another trick we might try is to move away from the original algorithm we were using, and try to find a faster way of calculating Fibonacci numbers that doesn’t need specialized enhancements to run efficiently. Here’s one possible alternative:

def fib(n):
    if n < 2:
        return n
    current_fib, next_fib = 0, 1
    while n > 0:
        current_fib, next_fib = next_fib, current_fib + next_fib
        n -= 1
    return current_fib

Rather than rely on recursive calls to itself, this version of fib just calculates Fibonacci numbers by running through a loop; after each trip through the loop, current_fib is the current Fibonacci number and next_fib is the next one; the double assignment moves the value of next_fib into current_fib, and also adds them together to get the “next” value to put into next_fib. A quick test shows that this is also an extremely fast algorithm; again, it takes only a fraction of a second to spit out the first fifty numbers.

Because this version is just a simple loop, it’s roughly linear — calculating fib(n) just takes n trips through the loop. That’s a big improvement over the exponential version we had originally. And, more importantly, its memory usage is much better than the memoized version of the original fib — at most, it only needs to store four values in memory (n, i, current_fib and next_fib). In other words, for memory usage this algorithm is roughly constant. I say “roughly” because the numbers being stored will grow larger and eventually require larger data types — even long integers aren’t sufficient to get through the first hundred Fibonacci numbers (although in a statically-typed language like C or Java, its memory usage would be constant, because you’d have to decide up-front what type to use for the numbers; the choice of type would also limit how far into the sequence your program could go).

Where to go from here

Optimization is a massive topic, worthy of many books’ worth of discussion. There are common techniques, like memoizing a function or switching it to a different algorithm, and learning what those techniques are and what advantages and drawbacks they have in different situations is a good way to get started. Beyond that, reading good code and putting together a list of good books can help you broaden your understanding.

For starters, Wikipedia’s article on optimization is a good overview, and following the links you’ll find there should help you get a feel for what’s out there on the topic.

And perhaps the most important thing to understand is when not to optimize. Donald Knuth, who’s one of the greatest living gurus of computer science, is famously reputed to have said that premature optimization is the root of all evil; in even a moderately complex program, it can be hard to accurately guess where the performance bottlenecks will be, so trying to optimize before you’ve even run the program is likely to make your code more complicated than it needs to be. Generally, you should just write something you know will work, and then run it a few times (using profiling tools if necessary) to figure out where it needs work, and then start thinking about the best solution to speed it up.