A fun little bug

June 26, 2009 Django, Misc, Python

About two months ago, I quietly wrote and released a little Django application which generates cross-domain policy files for Flash (if you’ve never encountered this aspect of Flash, here’s an introductory article on how it works). I’ve done a bit of hacking on it in my spare time, and I’m pretty happy with it as an example of a simple, tightly-focused application; sometime soon I hope to write a bit more about the development process behind it, because it provides a useful real-world guide to some of the abstract best practices I like to preach about.

But for now I just want to talk about an interesting little bug I recently uncovered in its test suite.

I’ve been working over the past few months to get into better habits when it comes to unit-testing applications I distribute, and so naturally I did my best to give this app a solid test suite. And to help with actual quantitative measurement I’ve been using Ned Batchelder’s coverage.py to get nice reports of how much of my application code is actually being tested. I’m proud to say that as of version 1.3 (released recently), my little app for generating cross-domain policies has 100% test coverage. The report for the current tip of the repository looks like this:

Name                                                                          Stmts   Exec  Cover   Missing
-----------------------------------------------------------------------------------------------------------
/Users/jbennett/dev/hg/personal/django-flashpolicies/flashpolicies/__init__       1      1   100%   
/Users/jbennett/dev/hg/personal/django-flashpolicies/flashpolicies/models         1      1   100%   
/Users/jbennett/dev/hg/personal/django-flashpolicies/flashpolicies/policies      54     54   100%   
/Users/jbennett/dev/hg/personal/django-flashpolicies/flashpolicies/views         14     14   100%   
-----------------------------------------------------------------------------------------------------------
TOTAL                                                                            70     70   100%

But it didn’t always look that nice.

The line that wouldn’t run

I first took coverage.py for a spin on this codebase a few days ago, and it reported that there was one line in flashpolicies.policies which wasn’t being executed during testing runs. And I had no idea why. The line in question was simple enough; it was meant to raise a TypeError if a particular method was called with an invalid argument. I had a test which was supposed to be exercising this behavior. So I was a bit stumped.

The first thing I tried, of course, was running just that single test method to ensure that it really was executing and really was passing. And, sure enough, the test runner spat back that it ran one test, successfully. Then I upgraded my copy of coverage.py — Ned had just released a new version — and tried again, just in case there was some sort of bug in the coverage calculations. No dice: it still reported that one line wasn’t executing.

So, in desperation, I opened up the test suite and changed it: instead of asserting a TypeError would be raised, I had it assert a ValueError. If the application code was working properly, this would make the test fail (since it would still be raising TypeError). I saved the change, ran the test again, and was surprised to see that the test wasn’t failing; instead it was dying with an error in the test suite itself:

Traceback (most recent call last):
  File "/Users/jbennett/dev/hg/personal/django-flashpolicies/flashpolicies/tests/policies.py", line 120, in test_bad_site_control
    self.assertRaises(ValueError, policy.site_control, 'not-valid')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/unittest.py", line 336, in failUnlessRaises
    callableObj(*args, **kwargs)
TypeError: 'NoneType' object is not callable

After staring at this for a bit, I had an “ah-ha!” moment.

Down the rabbit hole

Flash cross-domain policies are XML files. And generating and manipulating XML can be somewhat tricky, so for my application I implemented a class which represents a policy and exposes methods for setting the various options. Once you’re happy with them, you can hit a property which generates a standard Python xml.dom.minidom.Document object you can serialize and either write to a file or serve directly (which is what the views in the application do). One of the options controls the types of policies which will be allowed for a particular domain, and is represented in the XML as an element called site-control. So I’d written a method on my Policy class called site_control() and added unit tests for it.

Except then I decided to change things up a bit; in the cross-domain policy specification, this option is referred to as a “meta-policy”, and so I decided to change the name of the method to metapolicy() to match. And here I made the first half of the key mistake, because I didn’t change the method name in the test suite; it was still looking to call a method named site_control().

But the test still passed, and so I didn’t notice the problem until coverage.py complained that a line in that method wasn’t executing during test runs.

Figuring out why the test still passed required a bit more thinking. In theory it should have begun failing immediately, since there was no longer a method named site_control(), and so any attempt to access it at all should have raised an AttributeError. But after a few moments I noticed the other half of my mistake: I needed to store the metapolicy information somewhere for use in generating the XML, and so I’d put an attribute on the Policy class named… site_control. Which initializes to a value of None when a new Policy is instantiated.

At this point I had a complete picture of what was happening:

The test was attempting to call the site_control() method with an invalid argument, and expecting a TypeError.
There was no longer a method of that name, which would normally raise AttributeError, but
There was an attribute of that name, which — at that point in the test run — had a value of None, and so
The test was getting the value of that attribute and attempting to call it.
But None isn’t callable in Python, and in Python trying to call a non-callable object raises TypeError.
So the test — by getting the wrong thing from the class — was seeing the TypeError it had been told to expect, and so was passing.

And, of course, as soon as I changed the exception type in the test to try to debug the problem, I started seeing the real problem: instead of a TypeError raised as expected from inside the application code, a TypeError was being raised in the test suite because the thing I’d told it to call wasn’t callable.

Post-mortem

As soon as I understood what was going wrong, I made the appropriate change to the test suite, ran it again and got back a much nicer report from coverage.py. And now I’ve learned my lesson about making changes to code and forgetting to make corresponding changes in the test suite.

Of course, I’m sure that as soon as I publish this post I’ll get some predictable responses:

Hard-core TDD people will say I’ve still got it backwards, and that I should make the changes to the tests first, confirm they fail, and then make the changes to the code to get the tests passing again. They may be right in this case, but I’m still a bit skeptical of some of the methodology of TDD (this article makes some good points about it, if you’re interested).
Fans of static type systems will point out that a “proper” type system would’ve caught this at compile time before I’d even had a chance to run the tests. They may also be right in this case, but frankly I can count the number of genuine type errors I’ve run into in the past few years on one hand; that’s not really enough, in my mind, to justify the overhead of most statically-typed languages (yes, I said “most”: Haskell people, you can sit back down now, and come back when your documentation doesn’t sound quite so much like Star Trek technobabble).

So hopefully we can simply take those responses as given, and avoid having a big flamewar over TDD and/or type systems, because really the only thing I wanted to talk about here was an interesting bug and the process of tracking it down. If you’ve got similar fun stories, feel free to share them in the comments.

Also, it goes without saying that I’m now an even bigger fan of coverage.py than before, since it’s doubtful I’d have ever known there was a problem without it. Go check it out.