Making magic

Published December 3, 2007. Filed under: Django, Programming, Python.

In yesterday’s article I spent a fair amount of time talking about the word “magic”, specifically in the context of Clarke’s Third Law, which states that

Any sufficiently advanced technology is indistinguishable from magic.

A big part of what I was getting at was that a lot of things which seem to be explicable only by appealing to “magic” are really just cases of technology — sometimes extremely simple technology — being used in a complex way. Or, to borrow an excellent turn of phrase from Terry Pratchett, “ninety percent of most magic merely consists of knowing one extra fact.” In the case of the “magic” which used to be (up until the 0.95 release, the first after the “magic removal” effort) in Django, the apparent “magic” was that you’d define a model class and it would mysteriously disappear, to re-surface as an entire module worth of code somewhere else.

To return to an example from yesterday, if you defined an Entry model inside an application named blog, that class would “magically” end up inside a module called django.models.blog.entries, and that entries module would “magically” sprout methods for working with the Entry class and a set of module constants and exceptions to go along with it. Now, that was a bit much to expect people to put up with — a class which should be at blog.models.Entry suddenly showing up inside a module that doesn’t seem to exist anywhere on the filesystem is simply a recipe for confusion — and so it was removed in favor of the more intuitive system we have now, where modules and classes mostly stay where you left them.

But the “magic” here simply consists of knowing one extra fact: that Python lets you dynamically construct modules at runtime and shove them pretty much anywhere you like. This isn’t a feature of Python which gets used all that often — and, in fact, shouldn’t be used all that often because of the potential for confusion — but understanding the extra fact which makes the “magic” work is useful knowledge, and provides a nice demonstration of the fact that Python is generally a much more dynamic language than people give it credit for.

So today we’re going to make some magic.

The goal

The technique for dynamically constructing modules and making import work on them as expected is actually pretty simple; like so many other things which are occasionally explained by “magic”, the things Django used to do to set up model modules were merely complex applications of simple principles, and were complex largely because of the sheer number of database API methods, model-specific classes and other machinery which needed to be built up.

So let’s work on a simpler example which illustrates the underlying principle. Our goal is to be able to do the following in a Python interpreter:

>>> import hello
>>> hello.say_hello()
Hello

At first glance this is fairly straightforward: we import a module named hello, call a function in it called say_hello() and it prints the word “Hello”. Nothing special about that, right? All you’d need to do is create a file called hello.py, stick the function in it and put the file somewhere on your Python import path.

Except we’re going to make this work without creating any files, and especially without ever creating a file named hello.py; the hello module is going to be created dynamically, the say_hello() function is going to be put into it dynamically, the resulting module is going to be made importable dynamically and the whole thing is only going to exist in memory during a single interpreter session. And when we’re done you’ll have a pretty good idea of how Django used to set up the “magic” model modules, even if that was a slightly more complex use of the same principles.

Understanding import

The first thing we need to understand to do this is how Python’s import mechanism works; in other words, the exact steps Python goes through when it encounters an import statement. If you want all the gory details, the official Python documentation on importing isn’t too bad an explanation of this process, and Fredrik Lundh has an excellent write-up of all the nooks and crannies of Python’s importing mechanism. I highly recommend giving both of those a thorough read at some point, but for now let’s walk through the key points together.

When you have a statement like import hello in a Python program, Python goes through two steps to actually import it:

  1. Locate and, if necessary, initialize the module.
  2. Bind the resulting module object to a name in your current scope.

So when Python sees import hello, it wants to locate and possibly initialize a module named hello, then assign the resulting module object to the name hello in your program’s current scope. If the import statement ocurs at the top of a file, hello will become a module-global name, for example.

The first step — locating and initializing the module — can happen in either of a couple of ways:

  1. If the module hasn’t already been initialized, that needs to happen. For most Python modules, that simply consists of executing the code in the module so that, for example, any classes or functions it contains get defined.
  2. If the module has already been initialized, there will be a module object already in memory for it, and Python can simply grab that object.

Python figures out whether the module has already been initialized by looking at a dictionary named modules which lives inside the built-in module “sys” of the Python standard library; sys.modules has keys corresponding to the import paths of modules which have already been loaded and initialized, and the values are the resulting module objects.

So the actual mechanism is pretty simple: when we say import hello in a Python program, Python goes and looks for the key “hello” in the sys.modules dictionary. If that key exists, Python gets the already-initialized module object out of sys.modules[‘hello’], and if it doesn’t then Python goes out to your file system and starts looking through the directories on your Python import path for a file or module named hello, which will — if found — be initialized and create an entry in sys.modules. If it isn’t found, Python will raise an ImportError.

One important thing to note here is that if you have a module which can conceivably be imported in multiple different ways — say, because both your project directory and application are directly on your Python path, so that both from myproject.blog.models import Entry and from blog.models import Entry will work — you can end up with a single module getting initialized more than once, and having more than one entry in sys.modules (one for each different way you’ve imported it). Significant sections of Django’s model-loading code exist to work around this and ensure that a given model class only gets initialized once.

Also, note that for module names which contain dots (e.g., import foo.bar), the mechanism is slightly different: Python looks for an entry in sys.modules which matches up to, but not including, the right-most dot, then looks inside the resulting module object for the final part. So in the statement import foo.bar.baz, Python looks for the entry “foo.bar” in sys.modules, then looks for something named baz inside the resulting module object.

By now you might be wondering whether, since sys.modules is a dictionary, you can just go stick things into it. The answer is that you can: you’re free to do anything to sys.modules that’s legal to do to a Python dictionary, though it’s almost always a bad idea to go messing around with it. But this points the way to how we’re going to make our eventual import statement work: once we’ve constructed the hello module, we can simply stick it into sys.modules and Python will happily let us import it without ever bothering to check if an actual module of that name exists on the file system.

Understanding modules

If you’ve ever tried to access, say, a nonexistent Django setting, you’ve probably seen an error like this:

AttributeError: 'module' object has no attribute 'some_random_name'

And so far I’ve been using the phrase “module object” to refer to Python modules. Both of these give us a clue about how we can build a module on the fly: modules, like everything else in Python, are simply objects, and you can instantiate new module objects just as you can instantiate objects from classes you’ve defined in your applications, assuming you know where to look.

The place to look is another module from Python’s standard library: types, which contains the type objects for many of Python’s built-in types. If you know your way around the types module, you can dynamically build nearly any sort of standard Python object on the fly, even some objects that you can’t normally construct otherwise. In this case the one we’re interested in is types.ModuleType, which we can use to create a brand-new module object at runtime; it works the same as instantiating any other object, and requires at least one argument: the name of the module object to create. You can also optionally pass a second argument which will become the new module’s docstring, so that Python’s built-in help() function will be able to show documentation for it (and other automated documentation parsers will be able to extract its documentation), but we’ll leave that off for this example.

So let’s go ahead and start building our hello module. Pop open a Python interpreter and type in the following:

>>> import types
>>> hello_mod = types.ModuleType('hello')

We now have a module object bound to the variable hello_mod; you can check that it really is a module and has the correct name — “hello” — in the interpreter:

>>> hello_mod
<module 'hello' (built-in)>

At this point, the new module is simply a blank slate; we can stick anything into it that we like. So let’s define the say_hello() function we’re going to use:

>>> def say_hello():
...     print "Hello"
... 

And then add it to our hello module:

>>> hello_mod.say_hello = say_hello

You can call this function and verify that it works:

>>> hello_mod.say_hello()
Hello

Putting it all together

Of course, we still need to make our new module importable via the name “hello”, but armed with an understanding of sys.modules this is easy:

>>> import sys
>>> sys.modules.setdefault('hello', hello_mod)
<module 'hello' (built-in)>

We’re using setdefault() here instead of assigning directly to sys.modules[‘hello’], because if there’s already a loaded module named hello we shouldn’t overwrite it; the setdefault() method of a Python dictionary takes a key and and a value, and then either inserts the value into the dictionary with the given key, if that key wasn’t already in use in the dictonary, or else does nothing to the dictionary. In either case it returns the value which ends up in the dictionary, which provides an easy way to figure out if you added a new value or not. In this case, the return value of setdefault() was the module we just created, so we know it was added to sys.modules successfully.

And now we can use the “magic”:

>>> import hello
>>> hello.say_hello()
Hello

This works because our dynamically-created module object is now in sys.modules under the name “hello”; since it’s in there, Python will simply return it any time we say import hello and never bother checking the file system to see if a “real” module of that name exists on the Python path. We can even use the alternate from syntax to import just the say_hello() function:

>>> from hello import say_hello
>>> say_hello()
Hello

Once again, Python is simply giving us back the contents of the module object we created in memory; the fact that it exists in sys.modules again bypasses any need to check the file system. And as soon as you exit the Python interpreter, the hello module will simply disappear; since it only ever existed in memory inside this single Python process, it will go away as soon as that Python process exits.

And now you know

At this point you can probably work out how Django — back in the 0.90 and 0.91 days — used to create the “magic” model modules which were importable from django.models; there was a lot more work going on to build up the things which eventually lived inside that module, but ultimately it boiled down to the same two things we just did: createing a new module object with types.ModuleType, and making it importable by inserting it into sys.modules.

I can’t stress enough that this is something you probably shouldn’t ever do in real-world code, because — as the example of Django’s old-style model system shows — it’s confusing and counterintuitive to mysteriously create modules where people aren’t expecting them. And messing with sys.modules, unless you really know what you’re doing, can also be dangerous; if you’re not careful you might accidentally delete or overwrite the entry for a module you were relying on, and then you’ll be in a real pickle.

But knowing how the process works — even if you never actually use it — helps to turn this from “magic” into a fairly straightforward application of a Python feature, and provides a useful glimpse into some of Python’s fundamental workings: knowing how Python’s import mechanism works, for example, is incredibly important and handy for a broad range of Python programming tasks, and shows off a major part of Python’s sometimes-downplayed dynamic nature.