New users of Django, even people who have lots of experience writing database-driven applications, often run into a seemingly simple problem: how do you set up a model with “optional” fields that don’t always have to be filled in? Django’s validation system assumes by default that all fields are required, so obviously you have to tell it which fields it’s OK to leave blank.
But therein lies the problem: there are two different ways you can “leave it blank”.
For the most part, assuming that fields should be required is a good idea; it’s easier to go back and mark something as optional later on than it is to recover from a critical required field that was being left blank until you noticed something was wrong. So Django merrily goes along making all your model’s fields required until you say otherwise, and things hum along smoothly.
But then you decide that it’d be OK for the “mother’s maiden name” field on your site’s user profile model to be left blank, so you go hunting in the Django documentation and find the blank argument. You add it to your model like so:
mothers_maiden_name = model.CharField(maxlength=50, blank=True)
And you tell your users that they don’t have to fill that in anymore.
And then you start getting the complaints. “I left it blank like you said I could, but I get an error!”
The problem is that the blank argument on your model only applies to Django’s own input validation mechanisms, not to your database. Databases are wonderfully flexible things, and can take in all sorts of data types: dates, times, strings, numbers and so on. They also have a very special data type called NULL, which is literally nothing. It’s the database equivalent of Python’s None keyword — NULL represents non-existent data.
If you leave a field blank in the admin, and you’ve specified blank=True for it, NULL is the value Django will insert into the database for that field. But that might not be allowed; when you create a database table, you have the option of specifying whether a particular column can accept NULL values or not; if you don’t want it to, you just add the declaration NOT NULL for that column, and the database will throw an error every time you try to insert nothing there.
And by default, Django creates all the columns in your database with NOT NULL declarations. Once again, this is a sensible thing to do; you might try to create an object directly in Python code, or execute raw SQL yourself, and if you do that you’ll bypass all of Django’s input validation. In these cases, the NOT NULL saves you from accidentally leaving a required field blank, and the same logic as above — better to change a field from required to optional than have to recover from a field that never should have been optional — applies.
To tell Django that a column in the database can truly be left blank, as opposed to just leaving a field blank in the admin, you use the null argument, like so:
mothers_maiden_name = models.CharField(maxlength=50, null=True)
And, of course, if you used that line above, you’d still get errors because even though the database will accept a NULL, the admin won’t let you leave the field blank. So to truly “leave it blank”, you have to use both arguments:
mothers_maiden_name = models.CharField(maxlength=50, null=True, blank=True)
This way, the admin will let you leave the field blank, and the database will let you insert a NULL value for that field.
If you’ve already created your database, and didn’t use null=True, you’ll need to execute a tiny bit of SQL to get rid of the NOT NULL Django set up for you; the syntax varies from database to database, but for Postgres it would look like this:
ALTER TABLE user_profile ALTER COLUMN mothers_maiden_name DROP NOT NULL;
For MySQL it’s more complicated, because MySQL doesn’t support the DROP NOT NULL command; instead you have to use the MODIFY command and re-define the column.
The distinction between blank and null is somewhat subtle and leads to a lot of confusion; people who are new to web development in general tend to think blank should work more the way it sounds, and people who’ve done web development before usually think null by itself should be enough to let a blank value go all the way through. Unfortunately, it’s not as easy as it seems: sometimes you really need to be able to leave something blank in the admin, but have a NOT NULL for it in the database, or allow NULL in the database but require it to be entered in the admin. To see why, let’s look at a simple example.
On this weblog, I use the Python port of John Gruber’s Markdown to convert plain text into HTML for output. Now, Django includes a built-in template filter (found in django.contrib.markup) which will let you use Markdown, but that wasn’t what I wanted. Doing it that way requires me to run Markdown over an entry every time it’s displayed; since the text of the entry generally isn’t changing from one page view to another, that seems awfully wasteful.
So I actually use two fields in my database for the body of an entry. One holds the Markdown-formatted plain text, and the other holds the HTML produced by Markdown (if this also seems wasteful, keep in mind that on shared hosting disk space is often far more abundant than processor time — you’re going to waste one of them, so waste the one you can afford to waste). And Django makes it easy to do this; I just have a custom save() method on my Entry model that runs Markdown over the plain-text field and inserts the resulting HTML into the HTML field, then saves both to the database.
This means the field which will hold the HTML starts out blank in the admin, since it’s going to get automatically filled in before saving (in fact, I don’t even show that field in the admin). But in order to display the entry there has to be HTML in that field at output time, which means it can’t go NULL into the database. So the model ends up looking like this:
body = models.TextField() body_html = models.TextField(blank=True)
If blank and null were condensed into one argument that controlled both Django’s input validation and whether the database accepted NULL, it wouldn’t be possible to do this sort of thing reliably. And calculating the value of one field automatically from another is an extremely common thing to have to do, so even though it’s a bit confusing at first it’s for the best that the two options stay separate.
Comments for this entry are closed. If you'd like to share your thoughts on this entry with me, please contact me directly.
Agreed! I don’t like the blank=True,null=True stuff. My proposed patch is to make blank=True set null=True
João, read the last bit again; it illustrates a case where it’s necessary for
nullandblankto be two separate arguments which do two different things.Ohh, sorry, I should have read it more clearly.
Thanks for this writeup on a potentially confusing subject. Once you understand the distinction it makes perfect sense and is pretty useful, as your example shows.
Wouldn’t Python’s
Nonebe closer in this case?Oops, wow it seems you’ve already changed that since my last reload.
One more thing: The model docs advise against using
null=Truein string-based fields, as the empty string is the preferred way of storing “nothing” and an ambiguity might result (if nothing is entered, either an empty string ornullwill be saved to the db). How would this work, i.e. what determines which of the two will actually be saved in this case?I think the point is that for out-of-the-box admin stuff, it is not advisable to use null=True for string fields. But as James pointed out, there are cases where you may need to do this and he showed how he overrides the save() method to accomplish it.
It may be common, but it isn’t normal. : )
You’ve come up a great example of when denormalizing a database (introducing a calculated field) has incredible advantages over keeping everything in perfect normal form. Purists will argue, “what if you edit an entry directly at the database level? Your data will become corrupt and invalid!” But they should just give it up. The advantages in this situation (decreased processing load on a shared server) greatly outweigh the risks.
Most of the database “purists” I know wouldn’t have a problem with this; they know perfectly well that the various normal forms are general best practices, but that there are situations where other concerns can take priority.
In my case, yes, I’m violating strict normalization by having the HTML column be dependent on the contents of the Markdown column. However, I do have the advantage of knowing that the only access is through my weblog application; I don’t have to worry about what happens if someone drops down and starts executing raw SQL, because I’m the only one who could do that and I’m not going to.
I’ve often thought it would be nice to also have an “optional” argument which defaults to False, but when set to True, implies blank=True, null=True.
You could still use blank=True and/or null=True when you need fine control, but optional=True should be a lot more understandable.
(Or alternately, required=False, with True being default)
Note that the Oracle database do not distinguish the empty string ” from NULL.
If both fields are meaningfull and it’s confusing when it come to use them separately, let’s aggregate them into one. It’d be better to name it “empty”. It would have values like None, AS_NULL, AS_BLANK, AS_NULL_OR_BLANK, and it would be needed to describe their meaning more carefully.
I would imagine the common case would be that the blank and null options are equal. Would it be less error prone to make this the default?:
null and blank default to the same value. setting one changes the other as well. you have to be explicit to get different values.
I guess its a little magical to automatically change blank=True when you specify null=True. But that way the most work and understanding is deferred for the rare case.