A stateful problem

Published: December 24, 2010. Filed under: Django, Misc, Pedantics.

So, this week we dropped some security updates, which you should definitely check out if you haven’t seen them yet.

We also released the first 1.3 beta, which is an important milestone since it means 1.3 is now feature-frozen and will get only bugfix and polishing work until the final release. Quite a few useful things made it in between alpha and beta, and a couple just barely slipped in under the wire. One feature in particular is near and dear to me, since I’ve been ranting about it for a couple release cycles now, and it makes for an interesting story.

In the beginning

Since the 0.96 release, the default distribution of Django has included the application django.contrib.localflavor, though it’s actually not one application; it’s a whole bunch of them. Each application in localflavor is named with an ISO 3166 code representing a specific country, and has code that may be useful to you if you’re writing applications which will be used in that country. So, for example, django.contrib.localflavor.br — for Brazil — offers form fields which can validate Brazilian tax-ID numbers, phone numbers, postal codes and state abbreviations. As of 1.3 beta, localflavor has applications providing this sort of utility code for over 30 countries.

One of those countries, of course, is the United States, which gets django.contrib.localflavor.us. The US localflavor application does pretty much what you’d expect; it knows how to handle US phone numbers, Social Security numbers, postal ZIP codes, and so on. And of course it provides a form field and model field — both called USStateField — which can be used to store the state portion of a US mailing address. Unfortunately, that field has been a persistent source of controversy, and at times hasn’t been anywhere near as useful as it could have been.

Originally, USStateField took a pretty liberal approach to its list of choices: with only a couple of exceptions, it would allow anything the US Postal Service recognized as a valid “state” in a mailing address. And then, two years ago, someone opened ticket 8425, pointing out that this was perhaps too liberal, because it meant several independent nations were considered “US states”.


A brief historical digression

The three countries in question — the Republic of the Marshall Islands, the Federated States of Micronesia and the Republic of Palau — are all island nations in the Pacific Ocean, which turned out to be a colossal piece of geographic bad luck. First they were colonized by Germany; then during the First World War Japan, which had declared war on Germany, seized them. Following that war they were administered under a League of Nations mandate until the 1940s, at which point they were caught up in the Second World War, during which (to borrow a phrase from Neal Stephenson) the United States of America and the Empire of Japan disputed, with rifles, each other’s rights to have military bases in that part of the world.

After that war, the League of Nations mandate was revoked and the United Nations established the Trust Territory of the Pacific Islands, to be administered by the US. That trusteeship came to an end in 1994, by which point the Marshall Islands, Micronesia and Palau all either had become or were well on their way to becoming independent nations (a fourth island group from the former trust, the Northern Mariana Islands, is now a commonwealth in political union with the US). Those nations then entered into an agreement with the United States — the Compact of Free Association — under which the US provides defense and various services and offers relaxed immigration rules in return for access to maintain military bases.

This makes for interesting history, but also creates the above quandary: although these three nations are independent, the United States Postal Service (which handles their mail, per the Compact) treats them as domestic, rather than international, mail destinations. Thus, USPS recognizes three “state” abbreviations corresponding to these three nations.

Django history

The above-mentioned ticket was resolved by removing the Marshall Islands, Micronesia and Palau from the list of “states” recognized by USStateField.

Of course, it couldn’t be that simple. Shortly after the commit which closed that ticket, another one — ticket 10308 — was opened, complaining that USStateField was missing several choices recognized by the US Postal Service. Among the missing “states” the ticket complained about were… the Marshall Islands, Micronesia and Palau. That ticket got a quick wontfix citing the earlier change, and ticket 9022 took over the remainder of the missing “states”, which comprised the postal abbreviations for the US Armed Forces overseas regions.

What, exactly, is a “state”?

But this isn’t the end of the USStateField saga. The preceding controversies raised the natural and relevant question of just what should be considered a “US state”. Obviously the fifty actual states qualify, but what about the District of Columbia? As the federal district of the United States, DC isn’t a “state” and is administered directly by the federal Congress, but does have a non-voting delegate in the House of Representatives. Puerto Rico also isn’t a state and also has a non-voting delegate. And then there’s the Northern Mariana Islands, already mentioned, plus American Samoa, Guam and the US Virgin Islands, all of which are US territories but not US states.

All of these are, however, “states” insofar as they have state postal abbreviations recognized by the US Postal Service. And, to top it all off, three additional “state” codes exist for delivery of military mail to US forces overseas.

The solution (hopefully)

For the last couple of Django release cycles, I’ve been agitating for a clean way to deal with this mess. There was some consensus on the django-developers list, during the 1.2 release cycle, on how to finally resolve the issues around USStateField, but unfortunately no-one — myself included — found time to sit down and code it up properly. So it lingered until Wednesday of this week when, about 14 hours before the 1.3 feature freeze, I opened ticket 14937 with a patch that tried to implement the consensus approach. That patch just barely got in under the deadline (it was one of the last commits before I started rolling the releases), and implemented what I hope is the solution.

As of Django 1.3, USStateField will accept anything that’s recognized by the United States Postal Service as a “state” abbreviation and meets one of the following criteria:

This means that USStateField accepts quite a few values which do not correspond to actual US states — and at least in the cases of DC and Puerto Rico that can be a fairly contentious thing — but does have the virtue of accepting every US-administered location to which the postal service will deliver. And since the thing most people are likely to use this field for is storing a mailing address, I’m OK with the murky metaphysical status of some of the choices.

Also, a new model field, USPostalCodeField (I’m not exactly wild about the name, but USPostalAbbreviationField is worse), has been added, along with a new form widget, USPSSelect. These take the blunt-object approach and simply accept anything the US Postal Service will allow, caring not a whit for whether the submitted value is a state, district, territory, commonwealth, military base or independent nation.

The actual choice tuples themselves (which live in localflavor.us.us_states, a name which may yet cause an international incident), meanwhile, got heavily refactored. There are now choice tuples provided for:

  1. The 48 contiguous states plus the District of Columbia (some businesses will only deliver to the “lower 48”, and not to Alaska or Hawaii)
  2. All 50 states plus the District of Columbia
  3. All non-state US territories (not including the District of Columbia)
  4. The three Armed Forces postal “states”
  5. The three nations which receive US mail service under the Compact of Free Association

The choices for USStateField are simply the combination of items 2, 3 and 4 in that list. USPostalCodeField and USPSSelect add item 5.


And a bit of fun

Since this has been a rather nasty political headache to deal with, and since it was finally going to get fixed, I also tossed a tiny easter egg into the final patch, largely for my own amusement, in the form of one additional choice tuple: django.contrib.localflavor.us.us_states.OBSOLETE_STATES. It’s not used by any model or form field, and consists entirely of “state” abbreviations which were, but no longer are, valid. This includes such gems as the postal abbreviation used by the Panama Canal Zone before it reverted to Panama’s control, so should you need to send mail back in time to the pre-1979 Canal Zone, you’ll be able to craft a model and form field to do so.

And for the record, this isn’t the first time localflavor.us has gotten an amusing (to me) bit of historical trivia baked in. The form field which validates Social Security numbers has, hard-coded into it as an invalid value, the former Social Security number of Hilda Whitcher, whose number appeared on a fake promotional Social Security card included in mens’ wallets produced by the E.H Ferree company and was subsequently widely adopted by people who purchased the wallets but failed to understand that they’d need their own Social Security numbers. Mrs. Whitcher was given a new number, and her previous one — 078-05-1120 — is no longer, and never again will be, valid.