Two new projects
Django and security are two of my favorite topics, and I think they go pretty well together. I’ve given a number of conference talks and tutorials on the theme of Django and security, and I’m one of the people on the receiving end of Django’s security-reporting email address. But although I spend a lot of time thinking about security, and trying to improve the state of the world through code, and occasionally ranting on various forms of social media, I don’t spend a lot of time writing about it here.
So let’s change that. Today I’m announcing the recent release of a couple new projects which have a security focus.
The first was technically released last month, and I did tweet about it at the time and add it to my public project list, but I didn’t ever do a proper public announcement.
Last month I spent some time cleaning up security-related config for this site, and happily now the only major thing left on my to-do list is verifying it’s safe to turn on HSTS and then actually doing it. But while using a couple tools to double-check my config I noticed one of them complained I wasn’t sending a
Referrer-Policy header. And I did a little digging and couldn’t find an easy/obvious library already written to do that in Django, so I spent a couple hours whipping up code, tests and documentation and django-referrer-policy was born.
If you’re unfamiliar with the
Referrer-Policy header, it fulfills some law of the internet or other by using the other spelling compared to the
OK, so. You probably have heard of
Referer; the idea is that you click a link and your browser makes a new HTTP request to fetch whatever URL you just clicked on, and can send the
Referer header along with the request as a way of saying “here’s how my user found your URL”. In the old days we loved the
Referer header and would walk uphill both ways in the snow to look at our raw web server logs in order to find out who had added us to their blogrolls.
Referer has a darker side. It’s been used for all sorts of analytics and tracking stuff, so much so that many technically-minded people demand a way to turn it off in the browser, and do turn it off. The
Referrer-Policy header extends that by letting a site give hints to a browser about whether and when to send a
Referer from links on that site. You can read up on it at MDN, but the gist is you can use it to, for example, ask a browser not to send
Referer on cross-origin requests, or not to send it when downgrading from HTTPS to HTTP, or to only include the domain of the originating link and not the full URL, or a few other nifty things.
And if you care about privacy and not leaking information, it’s probably a good thing to have a way to at least suggest to a browser that it should keep its
Referer to itself. So django-referrer-policy is a tiny app containing a single Django middleware class; you
pip install django-referrer-policy, add the middleware to your
MIDDLEWARE setting and add a
REFERRER_POLICY setting specifying what value you want the header to have. Then all your HTTP responses will include the
Referrer-Policy header, with that value.
On this site I’ve got the header set to
'same-origin', which means no
Referer for links going off-site, but that
Referer is suggested to be sent for intra-site links. Since I’m running on HTTPS, this is necessary to appease Django’s CSRF system, which does require and make use of
Referer for HTTPS requests. I know there are people who dislike this, but it’s the least-bad workaround we’ve found for mitigating a couple potential edge cases involving subdomains or crafty man-in-the-middle attacks.
Fun fact: I actually temporarily broke my contact form, and my ability to submit new blog posts, at first by setting the header to
'no-referrer' and causing every HTTP
POST request to suddenly fail CSRF verification.
At the moment, django-referrer-policy supports every value the W3C Referrer-Policy draft lists for the header. There is an empty-string value permitted for use in an HTML
meta element or via a
referrerpolicy attribute on a specific link, but its purpose is to cause fallback/default behavior and it’s not permitted as a header value.
This is the one I think is pretty cool.
If you care about security, you should know about Have I been pwned?, which is Troy Hunt‘s massive searchable database of data breaches, useful for finding out if an account of yours has been leaked, hacked or otherwise disclosed where it oughtn’t have been. It lets you check by email address (to see if yours has appeared in a breach), and also sign up for notifications if a domain you control has email addresses showing up in breaches.
Related to that, and on the same site, is Pwned Passwords, which is a searchable database of passwords from data breaches. This is both A) an extremely persuasive argument to use against anyone who insists “I use the same password everywhere, but it’s safe, nobody will guess it” (they will guess it), and B) useful as an automated way to prevent people reusing known-compromised passwords, since it has an API.
And recently the API got even better.
If you’re wondering how it’s safe to automatically check a user’s password against a third-party online database, you can read the link in the above paragraph which explains how the new k-anonymity search API works. But here’s a summary.
Suppose we see a user signing up for an account, and the password they decide on is that old chestnut, ‘swordfish’. On our site, we SHA-1 hash that password and grab the hex digest (upper-cased because that’s what Pwned Passwords will be doing):
>>> import hashlib >>> password = 'swordfish' >>> digest = hashlib.sha1(password.encode('utf-8')).hexdigest().upper() >>> print(digest) 4F57181DCAADE980555F2CE6755CA425F00658BE
Now, we’re going to take the first five hex digits of that digest —
4F571 — and put them in a request to the Pwned Passwords API. Specifically, we’ll hit the URL
The response is a bunch of lines of text, where each line is some hex digits followed by a colon and a decimal integer. These are the suffixes of hashes of compromised passwords, but only those whose hashes begin with the prefix we sent (
4F571, remember). We look through there to see if the rest of our locally-calculated hash —
81DCAADE980555F2CE6755CA425F00658BE — is in there, and sure enough it is. The response contains this line:
What this means is that a password whose SHA-1 begins
4F571 and ends
81DCAADE980555F2CE6755CA425F00658BE has appeared in its dataset of breaches a whopping 74,878 times. People really like ‘swordfish’ for some reason.
So now we know ‘swordfish’ is a compromised password, but notice that we never sent either the plaintext password or its full hash to Pwned Passwords, and Pwned Passwords never sent us anything that a third-party observer of our connection could take advantage of to find out what password we were checking. Thus, we have a safe way to check submitted passwords against Pwned Passwords.
So yesterday I released pwned-passwords-django, which provides a few utilities for taking advantage of this. I wasn’t the first to get there (the name “django-pwned-passwords” was already claimed on PyPI by a wrapper for an earlier API, for example, and a couple other wrappers for the new one exist now). But I’m confident that I am the first to publish a single application which provides all three of the things pwned-passwords-django can do.
The first thing it provides is a validator which you can hook into Django’s password-validation framework. Password validation has been around for a while in Django, with several built-in validators and a configurable way to add more, though it’s not turned on by default. If you slot in the validator provided by pwned-passwords-django, all high-level password-manipulating APIs in Django (including the
UserCreationForm and password change/reset views built in to
django.contrib.auth) will use Pwned Passwords to reject any attempt to set or change a password to one that’s known (to Pwned Passwords) to be compromised.
Several applications are now floating around which just provide a password validator that checks Pwned Passwords, and if that was all I wanted I wouldn’t have written another one. But in one of Troy Hunt’s recent posts about it, he mentioned someone had written a Cloudflare Worker which automatically looks for passwords submitted in incoming HTTP
POST requests, checks them against Pwned Passwords, and sets a header (which reaches the eventual application running behind Cloudflare) to indicate the request includes a compromised password.
I thought that was pretty darned neat, and also figured it would be easy to do as a Django middleware. So the second thing pwned-passwords-django includes is a class you can drop into your
MIDDLEWARE setting which will scan every incoming
POST for things which look like passwords (configurable via a regex in your Django settings), and add a new attribute to the request indicating what Pwned Passwords thought about them. You can then check
request.pwned_passwords in your Django view code (or in later middlewares) to see if someone’s used a compromised password and take appropriate action.
If the request wasn’t a
POST, or didn’t contain a password or didn’t contain a compromised password,
request.pwned_passwords will be an empty
dict (which conveniently evaluates false in a boolean context).
If the request is a
POST, and does contain a password and it is compromised, then
request.pwned_passwords contains one key for each item in
request.POST that contained a compromised password, and the values will be the count of appearances of those passwords from Pwned Passwords. So, for example, if the incoming request contains
request.pwned_passwords will be
Finally, pwned-passwords-django also provides a
pwned_password() function you can use to directly manually check any password you like, any time, since there are cases where you might need to do that.
If you use Django’s auth system with password-based authentication, I’d like to humbly suggest you make use of pwned-passwords-django to help protect your users (and your site/service!) from the scourge of reused/compromised passwords.
I’m still working on django-registration 3.0, and haven’t finalized the feature set yet. I was fiddling with adding serializers and API view classes for Django REST framework, but I’m not sure I want to do that, or that it’s necessarily a good idea to try. Several of my other apps have recent releases to officially mark them Django 2.0 compatible, and the rest will probably get there soon.
Also, since the first attempt at pwned-passwords-django yesterday involved not one but two embarrassing mistakes, and since it produced a minor digression on Twitter, I’ll soon be writing up a nice list of all the ways that even seasoned veteran programmers screw up, starring myself. Keep an eye out for that in the next week or so.