Checking if you’re pwned (with Django)

Published: June 18, 2018. Filed under: Django, Python, Security.

Back in March I announced the release of a couple security-related projects for Django, one that implements the Referrer-Policy header, and one that uses the Pwned Passwords database of Have I Been Pwned to check users’ passwords.

Today I’ve bumped the version and rolled a new release of pwned-passwords-django; if you’re reading this, version 1.2 is on the Python Package Index, and is only a pip install away. And, of course, there’s full documentation available on how to use it.

(technically, 1.2.1 is now the version on PyPI, because I messed up something in the changelog, didn’t catch it when rolling 1.2, then immediately fixed it and pushed 1.2.1 with a correct changelog and no other changes, but for the rest of this post I’ll call it “1.2”)

There aren’t any backwards-incompatible changes, so it should be a clean upgrade if you were already using 1.1, but it is still a bit of a big update, and I want to take a moment to talk about it.

Background

If you don’t feel like clicking over to a whole separate blog post to read about it, here’s a quick rundown of Pwned Passwords and how pwned-passwords-django uses it.

Pwned Passwords is a simply massive database of passwords which have been compromised via data breaches. It’s incredibly useful as a tool for preventing users from choosing or reusing bad passwords. And pwned-passwords-django is a Django application which can talk to it, via its API.

First, Pwned Passwords doesn’t ever have you submit a password, or even a hash of a password, for checking. Instead, you calculate the SHA1 hash of the password on your end, and send only the first five digits of its hex digest to Pwned Passwords. Then Pwned Passwords responds with the remaining 35 digits of any hashes it has which match your five-digit prefix, and a count of how many times each appears in the database. Then you look in the response for a 35-digit suffix matching the hash you’ve got, and if you find it you know that password is compromised.

There are three things in pwned-passwords-django that use this:

All of this was present in 1.1 (and 1.0).

Time out, go sit in the corner

Version 1.1 of pwned-passwords-django generally assumed the Pwned Passwords API is reliable. And it is! But one of the first adopters was a large-ish site which did still see some issues with occasional timeouts and other errors. So 1.2 includes more robust handling for error conditions.

But: the whole point of this is that it runs whenever someone enters a password. And if the error handling when Pwned Passwords is down/timing out just skips the check, then it’s silently (well, not completely silently, since the problem gets logged) allowing a potentially-compromised password to be set. And that’s not good.

There were a few rounds of PRs from multiple contributors on this, and the eventual approach settled on is that if you’re using the password validator, pwned-passwords-django will log the problem, then fall back to Django’s bundled CommonPasswordValidator, which uses a much smaller locally-stored list of common passwords. This isn’t as comprehensive as Pwned Passwords, of course, but it’s better than nothing.

If you’re using the manual check, the pwned_password() function will return None (and log the problem) rather than its usual result of a count of how many times the password has been compromised.

If you’re using the middleware… I’m still debating what to do. Right now, it doesn’t have a fallback mechanism or any way of signaling that it failed to talk to Pwned Passwords, other than logging the problem. I’m open to ideas, though, so if you have a good one (keeping in mind it needs to be backwards-compatible with request.pwned_passwords evaluating False unless there really is a compromised password), let me know for 1.3.

Other changes

Writing error messages for a Pwned Passwords validator is tricky. You want to communicate to the user why they can’t choose that password, but you also don’t want to mislead them. It’d be very easy, for example, to read a “this password is compromised” message as “this site is compromised”. Version 1.1’s error messages weren’t that great, but 1.2 provided a simple solution: since the validator falls back to Django’s CommonPasswordValidator anyway on connection timeouts, it just always uses the messages from Django’s CommonPasswordValidator, which get the point across in a non-scary way. Plus, Django already ships with translations for those messages, so pwned-passwords-django doesn’t need to collect translations for them.

If you don’t like the default messages, though, they are now customizable; you can pass your desired error messages in the OPTIONS to the validator when you specify it in your settings. You can also set a custom timeout for making requests to Pwned Passwords; the default is 1 second.

Finally, the low-level pwned_password() function is a bit stricter now about its input; it requires that you give it a Unicode string. Passing in a bytes object was already highly unlikely to work, and would only accidentally work sometimes if you were on Python 2.7 and Django 1.11, and might raise any of a couple confusing errors when it failed. Now, it’s guaranteed to fail, and will fail with an informative TypeError including a message telling you what went wrong.

Most of the improvements in 1.2 were contributed via pull requests; Michael Cooper and Jon Dufresne did almost all the heavy lifting, and put up with my badgering questions in the PR discussions.

The other thing

Back in April I wrote a post about how I test Django applications, which went over the various tools and typical config I like to use for this. Naturally, as soon as I’d written that I decided it was all wrong.

Or, rather, that something more fundamental was wrong. Everything I wrote there is still applicable: I still prefer unittest-style tests instead of pytest style; I still use the same basic technique to run Django “standalone” during testing; I still use tox to handle testing against the full matrix of Python and Django versions; and I still use setup.cfg to configure some things, like the coverage report and flake8 linter settings.

I have stopped providing the setup.py test entry point, largely as a matter of taste. I never particularly liked the setuptools approach of adding things to setup.py, and I don’t really feel like setup.py is the obvious way, in 2018, to run tests. And it seems like things are trending that way in other projects, too, with most people standardizing on “just invoke tox”. So I’ve stopped supporting setup.py test.

I’ve also expanded the tox.ini file a bit; most of this was borrowed from studying Django’s tox.ini file, but now in addition to running the full matrix of tests against Python/Django versions, and building the documentation, tox also spell-checks the documentation, and runs isort to check that import order is correct.

But the big change is the overall layout of the repository. Previously, the packaging-related files and the tox config were top-level, along with docs/ and a pwned_passwords_django/ directory containing the code and a tests/ subdirectory. Now, the packaging files and documentation are still top-level, but the tests/ directory lives at the top level, and pwned_passwords_django lives inside a directory named src/. The packaging configuration ensures everything still makes it into the final package (when building a source distribution; wheels don’t include tests or documentation).

This is something I’d seen a few packages do, and that some blog posts recommend, but I’d never really bought into the idea until, well, I did. You can read those posts for some good arguments, but I’ll just quote Hynek’s summary of the most important argument against the code being top-level:

Your tests do not run against the package as it will be installed by its users. They run against whatever the situation in your project directory is.

When your code sits top-level in its correctly-named module, it’s implicitly on the import path, and your tests will find it that way. Once you move it, you either have to hack the import path (don’t do that!) for testing, or you have to actually make your packaging work, and then your tests run against your package as it will actually be installed.

A little while back, I did something I probably shouldn’t have in one of my apps, and only accidentally noticed that it completely broke the package… while not affecting my local tests, because my local tests didn’t need to install the package to run.

I started the switch to a src/ layout the next day.

I’m still fiddling a bit with some of the details of using this layout, especially as they pertain to Django-specific bits, but I suspect that the next wave of releases I do of my personal projects will all switch over to the src/ layout with top-level tests.

One final note: the way I’m doing this for pwned-passwords-django means the tests get included in source distributions, so you can (assuming you have tox and all the requisite Python versions) unpack the .tar.gz manually, and run tox to execute the tests. But running pip install pwned-passwords-django, or a setup.py install from a manually-downloaded package, will not install the tests. The only thing you really get out of installing tests is that, if the app ends up in INSTALLED_APPS, its tests run when you execute manage.py test.

I asked on Twitter and also chatted with a few folks directly to get an idea of whether there’s any value in that, and the responses ranged from neutral to negative. People mostly seem to trust that the third-party apps they use run their own tests in some sort of continuous integration that can be looked up, and either don’t care if they also execute in a manage.py test run, or actively don’t want them cluttering up the output. So pwned-passwords-django 1.2 will not install its tests. You can still manually grab the package and run tox if you want to verify it.

What’s next

As mentioned, I plan to convert all of my personal projects over to the layout described above. The next one in line is django-registration, which is gearing up for version 3.0; I’m also using the major-version bump to clean up a lot of the cruft that’s accumulated in its decade-plus of existence (django-registration dates to 2007!), and generally make it a bit easier to keep extending and working on into the future. Sometime after that I’ll deal with the outstanding pull requests for the 2.x branch, and probably push out a 2.4.2 or 2.5 depending on how big those changes end up being, and I’ll write about all of that when it happens.