Initial data and install-time code

Published: November 21, 2007. Filed under: Django.

A fairly frequently-asked question is something along the lines of “how do I provide some data which gets installed along with my application?”, or some variation on that, often including questions about how to ensure a particular bit of code is run when the application is installed via syncdb. Django provides several different ways of approaching this, depending on the exact situation and exactly what you need to do, and while they’re mostly documented it still seems to cause a lot of questions. So today let’s run down the different options and see where each one is appropriate.

Providing raw SQL to insert data

This is the oldest (it’s been part of Django since the beginning) and probably the simplest (both in terms of what you need to do, and what it supports) method of providing some initial data to be installed alongside your application: your application can simply provide a SQL file, containing appropriate INSERT statements, and Django will execute that SQL after it’s created the database tables for the application.

To do this, add a directory called “sql” to your application, and for each model which needs to provide initial data, add a file “modelname.sql”, where “modelname” is the name of your model (in lower-case; in other words, something you’d pass to get_model() or which could end up in the “model_name” attribute of the model’s “_meta”). So, for example, I have a blog application which contains an Entry model; I could populate some entries automatically by adding a directory named “sql”, containing a file named “entry.sql”, to the application.

In addition to simple INSERT statements you can put other SQL in here, but be warned that the order in which multiple initial SQL files for an application will be processed is not reliable (so don’t rely on one file being executed before another), and that not all features of SQL syntax are supported here; in order to deal with the limitations of the different databases it supports, Django does extremely simple tokenization to break up the file into individual SQL statements (delimited with semicolons) rather than simply piping the full file into a database client.

You can see whether an application has supplied some custom install-time SQL by using the “sqlcustom” manage.py command.

Using fixtures

Fixtures are the newest method for providing data to be automatically loaded, and are used heavily by Django’s testing framework to provide data for unit tests to work with. Rather than providing a SQL file, with a fixture you provide a file in a format supported by Django’s serialization system, and that file will be read and translated into model objects, which will then be saved into your database.

For automatic installation of initial data, create a fixture file (the easiest way is to use the “dumpdata” manage.py command) and make sure the file name is “initial_data”; in other words, the file can be named any of

Update: see Russell’s comment for some notes on choosing a serialization format.

Place this file into a directory called “fixtures” inside your application, and it will be automatically detected during syncdb and the data will be installed once your application’s database tabls are created.

You can also manually load a fixture using the “loaddata” manage.py command, if you have more fixtures you want to use or if you didn’t supply the fixture prior to running syncdb.

Using the post_syncdb signal

The third, and most generally flexible, method of taking some action when your application is being installed is to use the post_syncdb signal sent by Django’s internal dispatcher. If you’re not familiar with it, the dispatcher is a method by which various parts of Django — and your own applications — can notify each other of particular events, by sending out “signals”. One signal built in to Django is post_syncdb, which is sent after each application’s tables are created by manage.py syncdb.

To take advantage of this, create a file in your application called “management.py”, and add the following:

from django.dispatch import dispatcher
from django.db.models.signals import post_syncdb
from myapp import models as myapp

Replace the last import statement with one which imports your own application’s models; for example, if you were writing a blog application it might look like this:

from blog import models as blog_app

Then define a Python function which does whatever install-time work you’d like; once it’s registered with the dispatcher (we’ll see how to do that in just a moment), it will be called immediately after syncdb creates your application’s database tables, so it’s free to do anything it likes that relies on those tables existing, including changing them, adding additional features, or inserting data using Django’s model API.

Finally, register your function with the dispatcher, and set it to listen for the post_syncdb signal from your application, by calling dispatcher.connect(); there are three arguments you’ll want to supply:

  1. Your function. This should be the actual function, not just a string containing its name.
  2. The keyword argument signal, which should be the post_syncdb signal you imported.
  3. The keyword argument sender, which should be your application’s models module (imported as explained above).

So to continue with the blog example, you might define a setup_blog() function, and register it like so:

dispatcher.connect(setup_blog, signal=post_syncdb, sender=blog_app)

During syncdb, Django will look for the management.py file in your application and import it if it exists, which will cause the dispatcher.connect() line to execute and register your function; then when the tables for your application have been created, the post_syncdb signal will be sent for your application, and the dispatcher will make sure your custom function gets called.

Django’s authentication application uses this to prompt you to create a superuser during syncdb; its management.py file defines the function which prompts and creates the superuser, then connects that function to the post_syncdb signal and listens for its own installation. The sites application also uses this to create a default Site object when it’s installed.

If you’d like to have your application do something whenever any application is installed, not just your app, you can omit the sender argument, and it will be called every time syncdb finishes installing an application. If you do this, you’ll want your function to also accept a couple of optional arguments (the dispatcher will ensure they’re passed properly):

A couple of Django’s bundled applications use this trick:

Use as appropriate

Generally, each of these techniques works best in a different type of situation, so be sure to choose the one that’s appropriate to what you’re specifically trying to accomplish: