Python tips: don’t be too concise

Published October 28, 2006. Filed under: Programming, Python.

There’s an inherent tendency programmers have to take a piece of code and reduce it to the shortest possible form. The holy grail is, of course, cutting something down to a single line of code while still providing the same functionality; reducing a particular piece of code to a “one-liner”, especially if the code is somewhat complex, is sometimes viewed as a measure of a programmer’s intelligence or talent or both, and is often used as a sort of game (see Perl Golf, for example — the cult of the one-liner is stronger in Perl than in almost any other language).

And in a lot of cases it’s good to reduce code where possible; usually, if you’re able to shave a line or two, or cut out a few expressions, it’s because the code wasn’t written as clearly as it could have been, so you end up with something that’s functionally equivalent but expressed in a better way. So refactoring to get the actual size of the code down is often a good thing.

But not always; eventually you’ll reach a point where you’re reducing for the sake of reducing, and the end result can be harder to read and figure out. Outside of “obfuscated programming” contests, that sort of thing is frowned upon in most language communities (even Perl’s, once you get into the context of applications that have to be maintained), and is especially frowned upon in Python, where keeping code clear and readable is a major goal.

To illustrate this, let’s look at a simple example: a function which returns numbers from the Fibonacci sequence, where each number in the sequence is the sum of the previous two.

First version

At first glance, it’s relatively easy to write a recursive function which will calculate the nth number of the sequence: it’s stipulated that 0 and 1 are the first two numbers (knowing this is a requirement, because there aren’t two “previous” numbers to use for calculating those two), so if n is 0 you return 0, and if it’s 1 you return 1. Otherwise you just recursively call the function with n - 1 and n - 2, then add those together. Here’s a simple function which does that:

def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

You can test this by doing the following:

for i in xrange(10):
    print fib(i)

Which will call the fib function to calculate the first ten Fibonacci numbers.

This function works, but it’s not as concise or elegant (elegance is important in programming, and don’t let anyone tell you otherwise) as it could be. For one thing, we’ve taken a concept which can be expressed in a single sentence, and turned it into six lines of code; surely we can do better.

Second version

One thing that stands out is the first two branches of the conditional we’re using; when n is 0 or 1 the value we return is the same as n, so let’s rewrite the function to take that into account:

def fib(n):
    if n == 0 or n == 1:
        return n
    else:
        return fib(n-1) + fib(n-2)

The function still returns the same values (you can test it to make certain), but we’ve shaved out two lines of code, and now it’s clearer what’s going on: 0 and 1 are special cases.

Third version

We could also shave off one expression by looking at the if statement, and rewriting it slightly:

def fib(n):
    if n in (0,1):
        return n
    else:
        return fib(n-1) + fib(n-2)

Again, we can test to ensure it does the same thing as the previous versions, but now we’ve taken the if statement from being two equality checks with an or between them to being a single lookup in a tuple. In this particular case there’s no clear performance gain, but if there were a lot of “special-case” values to check for, this version would probably run faster — checking whether something occurs in a list or tuple will usually be quicker than a large construct which compares values.

Another way to condense this particular case would be:

def fib(n):
    if n < 2:
        return n
    else:
        return fib(n-1) + fib(n-2)

But when you have several “special case” values to deal with, it’s not always possible or efficient to fit them into a single comparison, so putting them into a list or tuple and using in is an important technique to know.

Fourth version

Next we can probably shave another line out of the function by removing the else statement; since fib returns a value immediately when n is 0 or 1, we don’t need the else there to prevent the recursive call from happening. So we can rewrite the function like so:

def fib(n):
    if n in (0,1):
        return n
    return fib(n-1) + fib(n-2)

A quick test will reveal that this still works correctly, and now we’re down to only three lines of code — half of what we started with. It’s also still quite clear what’s going on in the function: 0 and 1 are special cases which just return n directly; otherwise the procedure follows the standard rule of summing the previous two Fibonacci numbers.

Omitting an else statement when the previous “branches” of if and elif or else statements will always make the function return usually doesn’t hurt readability (and personally, I think it helps by emphasizing that some particular bit of code is the “default” to be used when none of the special cases match), so it’s usually OK to do this. Now let’s reduce the function even further and see some things that probably aren’t OK to do.

Fifth version

We started with a function which had six lines of code in its body, and now we’re down to three. We can get it down to two pretty easily by taking advantage of Python’s one-line conditionals:

def fib(n):
    if n in (0, 1): return n
    return fib(n-1) + fib(n-2)

But now the code is starting to get less readable; it’s easier to read the if statement on its own line and the corresponding logic indented beneath it than it is to scan the single line and figure out which part is the conditional and which part is the code to be executed.

Sixth version

Since we’re already down to two lines, let’s go ahead and make it one:

def fib(n):
    return (n in (0,1) and [n] or [fib(n-1) + fib(n-2)])[0]

Ouch.

As of version 2.5, Python has a ternary operator which lets you express an “if this do that, otherwise do the other thing” statement cleanly on a single line. Prior to that, you could do it but you had to rely on the “short-circuit” behavior of the logical and and or operators, and in some cases — this being one of them — it was necessary to wrap the whole thing up to act like a list, and pull the first value out of it (for an explanation of why, see Chapter 4 of Mark Pilgrim’s “Dive Into Python”).

In Python 2.5, this could be written somewhat more cleanly:

def fib(n):
    return n if n in (0,1) else fib(n-1) + fib(n-2)

This is better, but forcing everything into one line still costs us some of the clarity of previous versions. Whether this is an “optimization” worth making is probably going to be heavily dependent on how complex the logic is — if the things you need to do in the different cases involve more than just one or two expressions, this will probably hurt your code more than it helps.

Seventh version

Just to be totally evil, let’s really get this down to one line, and lose the def statement:

fib = lambda n: (n in (0,1) and [n] or [fib(n-1) + fib(n-2)])[0]

Or, in Python 2.5:

fib = lambda n: n if n in (0,1) else fib(n-1) + fib(n-2)

There are times and places where using lambda to generate an anonymous function “on the fly” is useful (for example, when you’re providing a comparison function for Python to use in sort, or in Django when you’re supplying a function for the user_passes_test decorator). This probably isn’t one of them; sure, we’ve gone from seven lines (def plus six lines of function body) to just one, but hopefully it’s pretty obvious that some of the earlier versions of the fib function were easier to understand at a glance.

So which is best?

Ultimately, it comes down to your own preferred coding style; if you find it just as easy to read one-line conditional expressions or lambdas, then use them. But in general, it’s a whole lot easier to understand what’s going on in your code when it’s just a little bit more verbose than it needs to be; to me, personally, the fourth version above (defining a standalone function with a three-line body) is probably the best tradeoff between readability and conciseness, but I can also see arguments in favor of the versions immediately before and after it.

As long as you keep readability in mind when you’re going back and refactoring your code to make it more concise, it will probably turn out OK; Python tends to be pretty clear no matter what you do to it (as we’ve seen here, you usually have to go out of your way to make Python code hard to read), and just thinking for a moment or two before you try to reduce something further will usually get you a good result.