About model subclassing…

Published February 20, 2007. Filed under: Django, Programming.

In Django 0.90 and 0.91 we offered the ability to subcless models, and a nasty hack called replaces_module which would let you tell Django to use a subclass in place of the original model it was inheriting from. The magic-removal changes broke that ability, and we’ve been slowly working toward getting it back (well, actually Malcolm has been rolling the Sysiphean rock up the hill and the rest of us have mostly been urging him on). I’d say it’s probably tied with automated database migrations as the feature people most often ask for from Django.

I’m not saying we shouldn’t get it working (though I’m sure someone will try to interpret it that way); I think we should and, in fact, I think we have to. But I am going to go out on a limb and say that in the vast majority of cases where people claim to want it, it’s not the best solution.

The object-relational impedance mismatch

Django includes an “object-relational mapper”, or ORM. The basic concept is pretty simple: it maps between objects in code and a relational database, so that you can have, say, a BlogEntry class defined in Python and get easy methods for storing instances of that class in a database and fetching them back later. This is generally a very useful thing, because it lets you work with Python’s native object-oriented concepts as much as possible.

But it’s not a perfect thing and never will be; one of the better explanations is an infamous article which claims that ORM is the Vietnam of computer science:

It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy.

The basic problem is that object-oriented (“OO”) programs and relational databases are built on different conceptions of the world and are highly optimized for things which suit their particular conceptions, and sooner or later you’re going to run up against a situation where the object-oriented conception and the relational conception are so different as to be almost irreconcilable. Subclassing is, I think, a great example of that situation, because there’s really no clean way to take a hierarchy of classes which inherit from each other and map them onto a relational database (and that’s just for standard subclassing — Python lets you inherit from multiple base classes, which can be nightmarish to map to a DB). There are a number of popular patterns for working around this, but none of them really solve the problem — at best, they’re situation-specific Band-Aids slapped over a gaping wound in the system.

The right way, most of the time

I’d wager that probably 90% or more of the things people say they want to do with subclasses could be better accomplished by instead defining a related model and linking it back with a unique foreign key. Lots of folks don’t like this idea and will cling to the notion that subclassing is a more pure solution, but let’s look at an example: Django’s built-in User class. Not coincidentally, this class is also the primary reason why people clamor for subclassing to work.

The User class has a set of fields on it which store a username, a password, an email address, the user’s real name, and a variety of information related to access and permissions. In OO terminology, it encapsulates the user’s authentication and access information. I’ve seen a lot of people say they want to subclass User not because they want to change the types of auth-related information, but because they want to add a field for the user’s website URL, or a short “bio” field, or lots of other useful information related to the user.

Did you spot the key word in that last phrase? Other useful information related to the user. That should be a dead giveaway that what we want in the database is a separate table where each row relates back to a row in the auth table. And in OO terms, the user’s website, bio and other information aren’t really part of their authentication and access controls and really should be encapsulated in their own object. So in OO terms what we want is a separate class where each instance has an attribute pointing to an instance of User.

As it turns out, the goals of having a separate class at the OO level and a separate table at the DB level mesh extremely well; in older versions of Django you just define a new model, put in the fields you want, and tie it back to User with a OneToOneField. Going forward, it’s better to use a ForeignKey with a unique constraint, but the idea is the same. And as it turns out, Django provides a built-in mechanism to make that much easier to work with.

And this example can be generalized to cover a lot of cases; most of the time when someone says they want subclassing, they’d really be at least as happy, and often better off in terms of application design, with a related model instead. And related models map far more cleanly onto a database than inherited hierarchies.

But when you need it, you really need it

Of course, not every situation is best solved by a related model, and some simply can’t be solved in that way. Going back to the User example, there are authentication schemes which require a significantly different set of information than what the built-in User model stores; Django does its best to work around that by letting you define custom authentication backends to handle those schemes, but there are going to be systems out there which absolutely require fields the built-in User model doesn’t have, and storing additional authentication info in a related model would break encapsulation at the OO level. Subclassing User and changing the field definitions to suit is a far better solution in those particular cases.

And there will always be cases like that, so I think we have no choice but to get subclassing working again. But hopefully it’s now a little clearer that subclassing — despite being an automatic instinct for an object-oriented programmer — isn’t always the best choice at either the OO level or at the DB level, and that the mismatch between object-oriented code and relational databases means that you should always at least think about encapsulating information in multiple related models instead of pushing into the quagmire of subclassing.