Making mistakes

Published: March 20, 2018. Filed under: Django, Meta, Programming.

A couple weeks ago when I was writing what became pwned-passwords-django, I tweeted about a weird issue I was seeing when running the tests for part of it. As it turned out, I’d overlooked something important, and the fix ended up being a one-line change. But that kicked off a little side discussion about the importance of being open about these kinds of “trivial” mistakes; it’s easy for newer or less-confident programmers to do something like that and start feeling stupid, unqualified and all sorts of other bad things. But experienced programmers — even, in this case, someone who’s allegedly an expert on the piece of technology in question — make those kinds of mistakes, too, and it’s good to make sure people are aware of this and that messing up, even in ways that feel dumb, is a thing that happens to everyone.

Especially to me.

I mentioned in a reply in that Twitter thread that I used to keep a list of my own commits fixing “embarrassing” bugs or problems I’d created. Unfortunately I don’t have it around any more, and as I recall a lot of them were just typos that I had pushed too quickly into one of my own repos (if they were still only in my local copy of the code I’d usually just rebase them away, for sake of cleaner commit history). I did manage to dig up a few with a trawl through my commit history, though. First, to drive home the “everybody makes mistakes” point, here are a few quick hits:

These five are nice quick reminders that everybody makes these kinds of mistakes. But I also want to take some time to talk about a few others which meet the same criteria — they were simple, usually one-line problems with easy fixes — that also offer some useful lessons. I’m going to go over three of them in more detail, explaining not just what was wrong, but also how it came to be wrong and what kinds of changes to my own development practices could help avoid them in the future, since while everybody screws up sometimes on small scales, what really matters is how you respond after making one; if there’s something you could do to avoid making the same mistake twice, the appropriate response is to figure out what that is, and do it.

The non-functioning password checker

While I was building pwned-passwords-django, that initial problem I tweeted about was a strange test failure: the middleware that checks submitted passwords against Pwned Passwords was working… but only on one version of Django.

The test suite for the middleware was passing on Django 1.11, but every test for the middleware was failing on Django 2.0. That was a bit of a head-scratcher (and not helped by the fact that I was coming down with a cold at the time).

If something works on one version of Django but not another, the first suspicion should be that something changed in Django. But the middleware just checks for the presence of certain keys (or rather, keys matching a certain regex) in the POST payload, and if it finds them runs their values through Pwned Passwords. The test failure told me that the appropriate keys weren’t being found and thus the Pwned Passwords API wasn’t being hit, despite the tests issuing requests which should have caused the middleware to go into action (and which did cause it to do so on Django 1.11). I spent some time racking my brain, and scouring the Django release notes, for anything that could have changed in the HttpRequest object, or in QueryDict (the data structure Django uses to represent request payloads), between 1.11 and 2.0 to break the way I was iterating and examining the keys in POST, and came up empty.

And then the light bulb went off: what if the problem wasn’t that the middleware was failing to find the right keys in the POST data? What if the problem was that the middleware wasn’t running at all? And that turned out to be the answer. From the very earliest days of Django, middleware was configured through the setting MIDDLEWARE_CLASSES. But back in Django 1.10, the middleware system got an overhaul. The deprecation/migration plan for moving from old to new middleware systems involved changing the name of the setting to just MIDDLEWARE. Django continued supporting both options — with the assumption that if you were using MIDDLEWARE you’d made the other necessary changes to keep your middleware working — but deprecated the use of MIDDLEWARE_CLASSES. And in Django 2.0, MIDDLEWARE_CLASSES stopped being supported.

When I build Django applications for distribution and reuse, I package them with a standalone test-runner file that configures the minimal settings for the application, and invokes Django’s test framework. Then I hook it into setup.py test to provide a nice standard entry point for running the tests. Whenever I start work on a new application, I copy and paste the test runner script, tweak what needs tweaking, and then use it. Except I wrote that script back when MIDDLEWARE_CLASSES was still supported, and as of Django 2.0 it isn’t anymore. So the tests for pwned-passwords-django were setting up MIDDLEWARE_CLASSES, which worked on Django 1.11 but was ignored on 2.0, and as a result Django 2.0 didn’t see the middleware and didn’t invoke it. Mystery solved: I changed MIDDLEWARE_CLASSES to MIDDLEWARE in the test runner, and my tests passed on both 1.11 and 2.0.

There are a few potential lessons here. The big one is that I probably should do a better job of maintaining my test-runner script. Checking at each new Django release to see if it needs any changes would be a good idea. Django’s system-check framework did have a compatibility check for middleware in Django 1.11, but it only warns if you’ve used both setting names, rather than if you only used the deprecated one. Ordinarily another lesson would be to pay more attention to the system check — which does execute as part of the tests — but in this case it wasn’t catching the problem (a bit of quick research seems to indicate there’s a philosophical debate about whether and how the system check framework should handle these things).

Another lesson is to pay attention to deprecation warnings; although the system check framework didn’t catch this, Django 1.11 did emit a DeprecationWarning any time it processed a request and found you weren’t using the MIDDLEWARE setting, but by default Python suppresses DeprecationWarning. I’ll be tweaking my test setup in the near future to toggle the warning filter and make sure DeprecationWarning is visible in test output.

The non-functioning password checker, part two

Speaking of pwned-passwords-django, there’s a reason why it was already at version 1.1 when I publicly announced it, and the reason is that 1.0 had a serious bug.

The 1.0 release was a bit rushed; I wanted to get a package together, with an API I liked, as soon as possible, in order to claim the name on the Python Package Index (since there are quite a few packages doing things with Pwned Passwords). So 1.0 was an API-stable release with passing tests… but it didn’t actually work.

Testing packages that interact with online services can be a bit tricky. I use tox on my own laptop to run the tests for my applications, and also have Travis CI configured to run them automatically on every commit I push to GitHub. Which is great up until I write something that has to interact with an external service over the internet. I maintain a couple packages that have this problem, and for pwned-passwords-django I decided not to have the tests use the real Pwned Passwords service. Instead, its test suite uses Python’s mock library to simulate the Pwned Passwords API, and just asserts that the code is issuing the right kinds of requests and doing the right kinds of things with the responses.

You might be wondering at this point whether I ever did any manual testing against the live Pwned Passwords service, and the answer is: not until after I’d released the 1.0 package. And that was when I discovered that the core piece of code in the app — the function that actually makes a request to Pwned Passwords, and parses out the response — didn’t work.

The bug was kind of a silly one. The Pwned Passwords API involves passing around fragments of SHA-1 hashes, and Python’s hashlib generates hash digests with lowercase hexadecimal digits. But the Pwned Passwords API uses uppercase hexadecimal digits. So, for example, when checking a password of 'swordfish', I’d be sending a hash prefix of 4f571 and looking for a response containing a suffix of 81dcaade980555f2ce6755ca425f00658be. But Pwned Passwords would be returning a suffix of 81DCAADE980555F2CE6755CA425F00658BE.

Oops.

The fix was one line in the actual code of the app, and a tweak to the mock setup to ensure simulated responses came back with uppercase hexadecimal digits. And so pwned-passwords-django 1.1 came out one hour and 43 minutes after pwned-passwords-django 1.0.

The lesson here is always do a test against the real service. Even if you don’t make it part of your runs-all-the-time automated test suite, you need to know that your code works when interacting with the real thing.

I do go back and forth, though, on whether to have interaction with the real service in the test suite. In the wrapper I maintain for the WordPress Akismet spam-detection service, the test suite basically repeats itself: it runs one set of tests using unittest.mock to simulate the Akismet service, and another set of tests against the actual Akismet service. But Akismet provides direct API support for testing, including parameters to tell the Akismet service you’re running tests, and specific sample inputs that are guaranteed to produce certain types of responses. As far as I know, Pwned Passwords doesn’t do that, so for now I’ll probably stick to using mocks.

The accidental type error

This one happened a few years back in django-flashpolicies, and I wrote about it at the time, but it’s a good story so I’ll tell it again. For background: Flash (the browser plugin) has a same-origin rule for making requests, much like JavaScript, and you can override that by publishing an XML file that describes a policy for whether and how you’d like to allow Flash to make cross-domain requests to your site. So django-flashpolicies is a Django application that knows how to generate the correct XML, and provides a set of Django views for common types of policies as well as a Python class implementing the full policy specification and a view that can serialize it out to XML and serve it.

One of the things you can do in a Flash cross-domain policy is set a “meta-policy”, which controls whether or not a site can serve multiple policy files specifying additional or alternative sets of rules. While the official name for this is “meta-policy”, the element in the generated XML is called site-control.

In django-flashpolicies, the Policy class has a method for setting meta-policy information, and in an older version, since the element in the XML was named site-control, the method for setting it was called site_control(). Back in 2009, though, I did some refactoring of the Policy class, in order to enable a simpler method of generating the XML, and renamed the method from site_control() to metapolicy() so that the class’ public API would match the terminology of the prose version of Adobe’s policy-file specification.

And when I did that, I introduced a bug in the tests, but I didn’t realize it at first because the tests continued to pass. In fact, it took me over a month to notice the problem, and when I did it was because I took a look at a test coverage report from coverage.py, which pointed out there was a code path in the Policy class that wasn’t being exercised. That surprised me at first, because I knew there was a test for that. And then I looked at the test, and started to understand.

The test in question was supposed to verify a case where an invalid option was passed in as the meta-policy value, and in that case the metapolicy() method is supposed to raise an exception. Specifically, it raises TypeError, on the grounds that the argument passed in was of the wrong type — not a meta-policy value. This is probably the wrong exception to use, and if I get around to bumping the major version before Flash gets retired by Adobe, I’ll change it to raise a ValueError instead.

So the test was calling the method and passing an invalid value, and using assertRaises to make sure a TypeError was raised. But when I changed the method name from site_control() to metapolicy(), I’d missed changing the name in that test; it was still trying to call a method named site_control(). Now, calling a nonexistent method in Python should get an AttributeError, not a TypeError, but I’d compounded the problem by keeping an attribute named site_control to store the value that would end up in the site-control element, and it defaulted to None. So when this test tried to call policy.site_control, it was trying to call whatever value was stored in the site_control attribute. Which happened to be None, and that’s not a callable type in Python. So… a TypeError was raised, which is exactly what the test was expecting to happen (but for the wrong reason), which meant the test passed.

The coverage report, though, revealed the issue and showed that the appropriate lines in the metapolicy() method were never being called during testing. I fixed the test, and the coverage report came back clean.

There are a couple of lessons here. One is always let your tools help you when you’re refactoring. I use an editor that’s perfectly capable of finding all references to a name and changing them for me, but when I renamed the site_control() method to metapolicy() I didn’t use it, and instead made the change manually. That was hubris — thinking I didn’t need the helpful automation — and I was appropriately punished for it, with coverage.py playing the role of the Greek chorus pronouncing my doom.

Another lesson is the value of having multiple metrics available for your code. In this case, a passing test suite was actually a useless metric, because the problem was in the test suite (bugs in tests are kind of a theme here, aren’t they?). But the additional metric of the coverage report revealed the problem immediately. I’m generally not a fan of coverage for coverage’s sake, but I am a fan of coverage reports as a canary; unexpected changes to coverage are a useful sign that something’s gone wrong somewhere.

Shortly after this bug, I also changed my testing setup to fail if the coverage report comes in under 100%. Again this isn’t about coverage for coverage’s sake; it’s about having a loud signal of something going wrong as opposed to a quiet, easy-to-ignore one. Remember, it took me more than a month to notice this problem — even though the evidence was there in the build output — because the coverage report didn’t affect the build status. I use a coverage metric as part of the CI build on all of my personal apps and libraries now, but I usually only turn it on once I’ve got the code and tests at a point where I trust them, rather than just rushing to get to 100% coverage as quickly as possible.

Everybody falls down sometimes

Finally, since all of the examples above were from my own personal projects (which mostly are single-person efforts, plus the occasional helpful pull request from a stranger), I want to mention something about Django itself.

For the last few years I’ve given occasional conference talks and tutorials on Django and web application security. The tutorial versions (for those who aren’t familiar with this: tutorials at PyCon and DjangoCon are in-depth three-hour sessions on particular topics; I’ve given the security tutorial once at DjangoCon US and once at PyCon US) spend much of their time going over categories of security issues and ways Django and Python can help protect you from them, or at least give you tools to mitigate them. The rest of the material — and the main focus of the shorter talk-length versions — is a historical tour of how Django’s security policies developed, and themes in the security issues that have been disclosed in Django over the years.

I like giving that talk because Django apparently has something of a reputation for security, and it’s good to talk about how you can build a good reputation while still making mistakes. When I gave the tutorial at PyCon 2017, I crunched the numbers and found that Django had averaged one security issue or advisory every 66 days over the preceding eleven years. For the record, I think the good reputation mostly stems from our having public security policies; responding quickly to reported issues; and the amount of work that’s gone into Django itself to try to protect you out of the box from some of the most common security issues in web applications, and to fix any mistakes the Django team makes in the framework itself. We fall down, a lot, and what matters most is that we get back up and learn from it.

Mistakes are opportunities to learn and grow, and that’s the way we should talk about them. Even when I make “silly” mistakes like forgetting how filters work in a Django template, there’s something I can learn from that: have the documentation open while I’m working! Especially if I’m working on something I don’t do often, like writing templates. And these days, I do that: even though I’ve literally written books on how to use Django, I still always have at least a couple tabs open to Python or Django documentation while I’m writing code, because I’m human and I forget things sometimes.

And that really is the lesson I want to get across today. Your feeling about yourself as a programmer should not depend on never making a mistake. You’re human, as are we all, and we all mess up from time to time. Sometimes we even forget really basic things that make us feel embarrassed afterward; I certainly do that. But feeling embarrassed, or stupid, or unqualified, even though it’s natural, needs to not be the only response. Everybody makes mistakes. What marks the good people is how they react and learn afterward.