In Django 0.90 and 0.91 we offered the ability to subcless models, and a nasty hack called replaces_module which would let you tell Django to use a subclass in place of the original model it was inheriting from. The magic-removal changes broke that ability, and we’ve been slowly working toward getting it back (well, actually Malcolm has been rolling the Sysiphean rock up the hill and the rest of us have mostly been urging him on). I’d say it’s probably tied with automated database migrations as the feature people most often ask for from Django.
I’m not saying we shouldn’t get it working (though I’m sure someone will try to interpret it that way); I think we should and, in fact, I think we have to. But I am going to go out on a limb and say that in the vast majority of cases where people claim to want it, it’s not the best solution.
Django includes an “object-relational mapper”, or ORM. The basic concept is pretty simple: it maps between objects in code and a relational database, so that you can have, say, a BlogEntry class defined in Python and get easy methods for storing instances of that class in a database and fetching them back later. This is generally a very useful thing, because it lets you work with Python’s native object-oriented concepts as much as possible.
But it’s not a perfect thing and never will be; one of the better explanations is an infamous article which claims that ORM is the Vietnam of computer science:
It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy.
The basic problem is that object-oriented (“OO”) programs and relational databases are built on different conceptions of the world and are highly optimized for things which suit their particular conceptions, and sooner or later you’re going to run up against a situation where the object-oriented conception and the relational conception are so different as to be almost irreconcilable. Subclassing is, I think, a great example of that situation, because there’s really no clean way to take a hierarchy of classes which inherit from each other and map them onto a relational database (and that’s just for standard subclassing — Python lets you inherit from multiple base classes, which can be nightmarish to map to a DB). There are a number of popular patterns for working around this, but none of them really solve the problem — at best, they’re situation-specific Band-Aids slapped over a gaping wound in the system.
I’d wager that probably 90% or more of the things people say they want to do with subclasses could be better accomplished by instead defining a related model and linking it back with a unique foreign key. Lots of folks don’t like this idea and will cling to the notion that subclassing is a more pure solution, but let’s look at an example: Django’s built-in User class. Not coincidentally, this class is also the primary reason why people clamor for subclassing to work.
The User class has a set of fields on it which store a username, a password, an email address, the user’s real name, and a variety of information related to access and permissions. In OO terminology, it encapsulates the user’s authentication and access information. I’ve seen a lot of people say they want to subclass User not because they want to change the types of auth-related information, but because they want to add a field for the user’s website URL, or a short “bio” field, or lots of other useful information related to the user.
Did you spot the key word in that last phrase? Other useful information related to the user. That should be a dead giveaway that what we want in the database is a separate table where each row relates back to a row in the auth table. And in OO terms, the user’s website, bio and other information aren’t really part of their authentication and access controls and really should be encapsulated in their own object. So in OO terms what we want is a separate class where each instance has an attribute pointing to an instance of User.
As it turns out, the goals of having a separate class at the OO level and a separate table at the DB level mesh extremely well; in older versions of Django you just define a new model, put in the fields you want, and tie it back to User with a OneToOneField. Going forward, it’s better to use a ForeignKey with a unique constraint, but the idea is the same. And as it turns out, Django provides a built-in mechanism to make that much easier to work with.
And this example can be generalized to cover a lot of cases; most of the time when someone says they want subclassing, they’d really be at least as happy, and often better off in terms of application design, with a related model instead. And related models map far more cleanly onto a database than inherited hierarchies.
Of course, not every situation is best solved by a related model, and some simply can’t be solved in that way. Going back to the User example, there are authentication schemes which require a significantly different set of information than what the built-in User model stores; Django does its best to work around that by letting you define custom authentication backends to handle those schemes, but there are going to be systems out there which absolutely require fields the built-in User model doesn’t have, and storing additional authentication info in a related model would break encapsulation at the OO level. Subclassing User and changing the field definitions to suit is a far better solution in those particular cases.
And there will always be cases like that, so I think we have no choice but to get subclassing working again. But hopefully it’s now a little clearer that subclassing — despite being an automatic instinct for an object-oriented programmer — isn’t always the best choice at either the OO level or at the DB level, and that the mismatch between object-oriented code and relational databases means that you should always at least think about encapsulating information in multiple related models instead of pushing into the quagmire of subclassing.
Comments for this entry are closed. If you'd like to share your thoughts on this entry with me, please contact me directly.
I totally agree with you. Most of the time I hear people talking about translating OO inheritance to relation data models they are usually describing a concept that better fits a table with a related subtable.
I must confess that I too have longed for data model inheritance. I don’t want the tables produced to implement inheritance (because, I agree, it just doesn’t make sense in relational data terms). What I do want is to be able to define models that inherent fields and then construct tables as if those field were defined in that model directly
EG. [Publishable: publish_date, archive_date]. or [Addressable: street, postal, city, country] … A “customer” table has an address. We snapshot that address into an “order” table so that if the customer changes their address after we send the order, we still know where that order went.
I suppose we could use a generic foreign key and place all addresses into one table. However, there is no back-reference to the appropriate “addressable” entity that created it.
If there is some obvious solution I’m overlooking, I’d be interested in reading your thoughts about it.
Mike, from what I recall when the new inheritance stuff was planned out, there should eventually be an “abstract model” sort of class that you’ll be able to inherit from, which won’t create tables of its own but will contribute fields to its subclasses.
James, The abstract parent model will not solve the User problem. There was a proposal to add the ‘replaces_module’ hack for the child class (in it’s Meta) to say that its not a relational subclass, but I think that was abandoned.
Doug: my previous comment was just responding to Mike’s particular question, which really would be solved neatly (if I’m understanding it correctly) by abstract classes. Other stuff will need real parent classes.
Great post. I think some of these thoughts should work their way into the main documentation, since it’s usually one of the first questions that people ask. Also I think that the abstract model idea is a great solution to provide some of the convenience of OO while still making sense on a db level.
Basicaly the only reasons I can see we even have to deal with this is because Django “needs” to run on a lot of hosting platforms and there aren’t a lot of good objective databases, least of all open source free ones and fast ones at that. So in the thirst for portability and speed we look to PostgreSQL and MySQL and end up getting bitten in the arse. That said, Django is a lot more popular because of the compromise, and I appreciate it. It can be a little mucky but I like how easy it is to work with from non Django code.
I was reading through Mike’s post as I realized that - at least to me - Python’s OO model suits database relations quite well, for instance when we look up an attribute y on object x Python hides a lot of what it does, which would be first looking at the instance’s member dictionary, then the class’ and then going up the tree doing the same thing.
This fits because the programmer isn’t aware that it looks deeper in the hierarchy for information, so it is possible to “flatten” a class to fit a database table, which would be what this abstract model would be doing for us; flattening out the class hierarchy to the “visible” attributes of that class.
So in short, what I’m saying is that models subclassed from other models should be represented in the database as a single model.
There are still some questions that I’m pondering though, like what would happen if class Y that is derived from class X wants to act upon data from class X when itself has that same attribute name (since we flattened it out, class X’s fields that also occur in class Y will be removed.)
Above ponder is also true for these abstract models that James spoke of.
Noah: If Django hadn’t supported MySQL, I wouldn’t have used it. I agree with you.
Nice article! I have been looking around for a solution to subtypes/supertypes with mysql/pgsql and one modeling method i use now is mentioned here: http://groups.google.com/group/comp.databases.oracle.server/msg/a23ffb19bfde2f20?hl=en
If only people realised that there isn’t such a thing as ORM, because we aren’t using RDBMSs but SQL DBMSs… then perhaps we would realise SQL is the problem, and the relational model the solution.
Please get this information more prominently linked from the main Django documentation. This obvious revelation about relating what otherwise seems so OOP-y is precisely what I needed to read/hear. Thanks!
Leandro: I’m inherently suspicious of your statement; it’s too reminiscent of an ad hoc immunization of a favored theory by the “well, it’s never really been tried” method. Sir Karl Popper famously went after such tactics when they were used in other areas, e.g., “you can’t say communism is bad because nobody ever really implemented Marx’s theory”, and I’m inclined to be extremely suspicious of “no-one ever really implemented Codd’s relational theory” because of the similarity.