Contributing to classiness (in Django)

Published: March 4, 2019. Filed under: Django, Python.

A couple weeks ago I ran a poll on Twitter asking people whether they’d ever used, or considered using, the contribute_to_class() method to write something that attaches to or hooks into a Django ORM model class, and if so what their thoughts were. There was also a “don’t know what that is” option, which won by a large margin, and I promised I’d provide an explanation.

Unfortunately, that was around the time I suffered a kitchen accident which left me without full use of my left hand for a bit. Full healing will take a little while longer, but it’s at the point where I can mostly type normally again, so it’s time to provide the explanation.

First, though, I need to provide some explanation for the explanation. Stick with me, and hopefully it’ll make sense at the end.

Keeping it classy

If you’ve used Django, or even just done some introductory Python tutorials, you’ve probably written or seen some Python classes. Here’s a simple one:

import math

class Circle:
    def __init__(self, center, radius):
        self.center = center
        self.radius = radius

    def area(self):
        return math.pi * (self.radius ** 2)

    def circumference(self):
        return 2 * math.pi * self.radius

This has most of the things we’re used to from classes: it has some methods, it has some attributes, it has a constructor you can call to create a new instance and pass in arguments that will affect the new instance. But tutorials usually don’t go much deeper than that, and don’t get into how Python actually handles classes. So let’s work through it.

In the Circle class above, the class Circle: line begins a block that contains nine more lines of code (followed, presumably, either by un-indenting or by the end of the file). Everything inside that block is executed, and the result is a dict representing the local namespace of that block. The keys are the names of the things defined inside the block, and the values are… their values. For example, the block of code for the Circle class will produce a dict containing three keys: '__init__', 'area', and 'circumference'.

Python then gathers up three things — the name given in the initial class statement, the parent classes (if any) included in the initial class statement, and the dict representing the namespace of the class body — and passes them, in that order, to type(). The return value from type() is bound to the name of the class. It’s important to note that type() is not a function. It’s a class, and specifically it’s the class of classes: just as you call Circle() to create a new instance of the Circle class, you call type() to create a new instance of the class… class.

So the process for defining the Circle class consists of:

  1. Execute the class body, and gather up everything from the resulting class namespace into a dict. In this case it’ll have the three keys mentioned above, corresponding to the names of the three defined methods.
  2. Call type('Circle', (), namespace_dict).
  3. Bind the result to the name Circle.

If you want, you can do this manually, building up a dict of the things to put in a class, and calling type():

>>> import math
>>> def __init__(self, center, radius):
...     self.center = center
...     self.radius = radius
...
>>> def area(self):
...     return math.pi * (self.radius ** 2)
...
>>> def circumference(self):
...     return 2 * math.pi * self.radius
...
>>> Circle = type('Circle', (), {'__init__': __init__, 'area': area, 'circumference': circumference})
>>> Circle
<class '__main__.Circle'>
>>> c = Circle(center=(0, 0), radius=1)
>>> c.area()
3.141592653589793
>>> c.circumference()
6.283185307179586

So now we understand how Python handles a class definition. But there’s one more thing we need to know about before we get back to Django.

That’s so meta

One advanced — and misunderstood, and often misused — feature of Python is something called a “metaclass”. The idea is pretty easy to explain: it’s a way to hook into the process described above, and modify the way class objects get constructed.

Earlier in this post, I described __init__() as the constructor of a class, and often that’s how new Python programmers are introduced to it. Although the distinction almost never matters, this isn’t quite correct: creating an instance of a class in Python is really a two-step process, where first the class’ __new__() method will be called, and then its __init__().

In other words, if we take the Circle class above and do c = Circle(center=(0, 0), radius=1), what Python actually does is roughly this:

c = Circle.__new__(Circle, center=(0, 0), radius=1)
c.__init__(center=(0, 0), radius=1)

If that first line looks a little weird: __new__() is actually a static method, though Python special-cases it so you don’t have to use the staticmethod decorator when defining it. It can’t be an instance method, since it’s the thing that creates instances. And since there’s no implicit first argument passed in automatically (like the implicit self of an instance method), the class gets passed explicitly as the first argument.

People like to get pedantic and argue that __new__() is the real “constructor” while __init__() is the “initializer”, but it’s rarely a useful argument to make; when you create a new instance of a class, Python calls both methods before returning the instance to you.

But: since defining a new class ultimately involves creating a new instance of type, this means it will involve calling type.__new__(). So Python’s mechanism for customizing creation of classes consists of subclassing type and overriding __new__(), and then telling Python to use your subclass of type. A class which does this is a metaclass.

Here’s a very small example:

>>> class SimpleMetaclass(type):
...     def __new__(cls, name, bases, attrs):
...         attrs['special_attribute'] = 'Special!'
...         return super().__new__(cls, name, bases, attrs)
...
>>> class SpecialClass(metaclass=SimpleMetaclass):
...     pass
...
>>> SpecialClass.special_attribute
'Special!'

The set of arguments to __new__() is the class (that’s always the first argument to __new__(), as we saw above), plus the specific arguments used to construct an instance of type(), since that’s what we’re subclassing.

And even though SpecialClass never defined an attribute named special_attribute, it still has that attribute, because we told Python to use SimpleMetaclass (instead of type) when building the class SpecialClass, and SimpleMetaclass inserts the special_attribute attribute into any class object it builds.

This can be a very useful feature for setting up automatic and seemingly “magical” attributes, methods, and behavior on a class; you can write a metaclass which makes the modifications you want, and either use it directly, or provide a base class that uses it, and inherit from that base class (child classes will inherit their parents’ metaclass).

And now it’s time for a brief aside. There are two places in this post where I’m obligated to provide a warning, and this is one of them. Metaclasses seem like a cool thing when you first learn about them, and you can probably come up with all sorts of cases where they’d be useful. But the cases where they’re actually useful are more rare; usually, the things you’d write a metaclass for are things you could have accomplished just as easily (and much more clearly) by putting the behavior into your class in the first place. Most of the downsides of metaclasses fall into two categories:

  1. They can make code harder to understand and reason about, since it might not be immediately apparent, deep inside a class hierarchy, that one of the parent classes had a metaclass, and as a result it may seem as if things are just appearing or changing inside a class for no reason.
  2. When a class has multiple parents that use different metaclasses, Python raises a TypeError and tells you: “the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases”. Since that’s not very helpful, you then go plug that error message into Google to find out what you’re supposed to do. The answer is you write another metaclass that subclasses all the metaclasses of the parent classes, resolving any conflicting or contradictory things they want to do, and use that metaclass on your child class. This may be complicated or impossible, depending on what the other metaclasses are doing.

And now for some Django

If you’ve written model classes using the Django ORM, you may have noticed some “magical” behavior — some things, like the Meta declaration, move around after you define them, and sometimes things show up in the model class that you never defined. For example, if you add a non-nullable DateField named pub_date to a model, that model will automatically sprout methods named get_next_by_pub_date() and get_previous_by_pub_date().

If you’ve made it this far into this post, you now know (or can figure out) how that happens: django.db.models.Model, which is the base class for all Django models, uses a metaclass (django.db.models.base.ModelBase). The model metaclass handles a bunch of things, including:

If you want, you can go read the whole model metaclass (that link will go to Django 2.1’s implementation) on GitHub to see all the stuff it does. But the thing that matters for this post is how the attributes of the model class get processed.

When ModelBase is going through the dict of things that will make up the model class it’s building, it checks each attribute that will end up in the model, to see if the value being assigned to the attribute has a method named contribute_to_class(). If so, it will call the contribute_to_class() method, passing in the model class that’s being defined and the name of the attribute being assigned. This allows a lot of work to be shifted out of ModelBase and into the various types of things that end up as attributes on model classes.

Quite a few of Django’s built-in model fields define the contribute_to_class() method, and use it to work their “magic”. For example, DateFields contribute_to_class() is what sets up those next/previous lookup methods. Relationship fields use contribute_to_class() to set up their infrastructure, including inserting the “reverse” end of the relationship into the other model class. Managers use contribute_to_class() to find out what model they’re attached to, and some other internals of the ORM use it for bookkeeping and to ensure the model configuration is correctly handled.

And now it is time for the second warning in this post: contribute_to_class() is an undocumented, private, internal API. Django does not provide backwards-compatibility guarantees for it, and if you make use of it you accept the risk that it might change or break at any time.

But sometimes third-party code needs a way to hook into the ORM and affect the way a model class gets set up. Some types of complex custom model fields will use this, for example, though that’s not the only use case. And when you need to do that, you need to do it; contribute_to_class() is there, and will do it for you.

My (potential) use case

I put up that Twitter poll because of something I’d been doing at work. Without going into too much detail: I was working on a proof of concept for replacing several instances of ad-hoc state-machine-ish code with actual state machines (I am very much in favor of people using more state machines, especially in combination with the Django ORM, but that’s a story for another day).

This involved connecting a state machine library — the one I was experimenting with was Automat — to some Django models. I wanted something I could set as an attribute on a model class, specifying a state machine class to use and the model fields it should use for storing and reading back state, and I wanted to ensure that:

  1. The fields in question would no longer be editable via the Django admin or other things using the Django forms library, and
  2. Whenever an instance of the model got created, an instance of the state machine would also be created and plugged into the correct attribute, and correctly initialized from the model’s fields (or put into its default starting state, for new model instances).

After spending a little time thinking about this, I eventually decided to write a wrapper class for the state machine, and have the wrapper be the thing that’s set as an attribute on the model class. The wrapper could then use the contribute_to_class() hook to ensure the required behavior is set up.

But it was the first time, as far as I can remember, that I’d ever written something that used contribute_to_class() and wasn’t either part of Django or a custom model field. So I got curious about whether there were other people out there using this technique. GitHub finds 363,000 results for contribute_to_class in public repositories, but a lot of those seem to be repositories that bundle complete copies of Django. And that was when I turned to Twitter.

I’m still not sure this was the right approach for what I was trying to write, or even that I was writing the right thing. And none of the people who selected “Yes, and I wish I hadn’t” or “No, I decided against it” on the Twitter poll provided any further explanation in the replies, so I also don’t have much context on the negative experiences people had. I expect this is something I’ll spend some more time thinking about.

One final note

If you were at DjangoCon US late last year and attended my “Mastering the Django ORM” tutorial, you hopefully didn’t have to answer “Don’t know what that is” on the Twitter poll, because one section of that tutorial was a tour of the internals of the ORM, including the model metaclass and the contribute_to_class() hook. Other sections covered public API, best practices, and a lot more material on the ORM.

Unfortunately, DjangoCon didn’t record tutorials in 2018, so there’s no video of “Mastering the Django ORM“. PyCon US does record tutorials, but PyCon has rejected this tutorial two years in a row now (and at the moment I’m not in a happy place with respect to the PyCon tutorials committee, though for more reasons than just “my tutorial got rejected”).

So in my occasional spare moments, I’ve been thinking about ways to make that material more broadly available. The likeliest option, at this point, is a book (and no, I’m not looking for a publisher right now; if I do this, I’ll write the book first and then figure out how to get it published), but that’s a pretty serious commitment, and I don’t think I’d do it unless I knew there was a lot of demand for it. So if you think an in-depth book on the Django ORM would useful to you — especially if you think it’d be useful enough that you’d buy a copy — let me know.