Understanding async Python for the web

August 16, 2022 Django, Python

Recently Django 4.1 was released, and the thing most people seem interested in is the expanded async support. Meanwhile, for the last couple years the Python web ecosystem as a whole has been seeing new frameworks pop up which are fully async, or support going fully async, from the start.

But this raises a lot of questions, like: just what is “async” Python? Why do people care about it so much? And is it really that useful for building web apps? What are all these new frameworks and other tools about?

So let’s dive in. If you already have a good understanding of how async Python works, or how async implementations in another language work, a lot of the next few sections may be remedial for you, so you should feel free to scroll past to the actual summaries of what’s going on with async in the Python web world, though there are a couple Python-specific bits that might still be useful to know.

Consider a function

Here’s a Python function that solves a well-known and well-hated problem:

def fizzbuzz(n):
    result = ""
    if not n % 3:
        result += "Fizz"
    if not n % 5:
        result += "Buzz"
    return result or n

Now suppose we call it: fizzbuzz(12). How does Python execute this function?

One way you can think about it is to imagine a sort of arrow pointing to whatever line of code is being executed right now; it starts out somewhere else, then something calls fizzbuzz(12) and the arrow jumps to the first line of the fizzbuzz function. Then it goes line-by-line, carrying out the instructions, until it reaches the end and the return statement, at which point the arrow jumps back to the line which called the fizzbuzz function.

We can also imagine more complex functions: say, a function with a loop in it might see the arrow jumping back up to the start of the loop over and over until it finally exits, while a function which calls other functions would cause the arrow to jump into those other functions and back again.

This also isn’t too far off from how Python actually executes a program. And hopefully it matches up with how you already understood Python, and programming in general.

A trickier example

Now let’s consider a different function. This one calculates the Fibonacci sequence:

def fibonacci():
    current, next = 0, 1
    while True:
        yield current
        current, next = next, current + next

You may have seen functions like this before, and perhaps you already have an idea of how they work and how they’re different from most other functions in Python. But let’s go over it explicitly.

Calling this function does a weird thing. It doesn’t actually return any Fibonacci numbers. Instead it does this:

>>> f = fibonacci()
>>> f
<generator object fibonacci at 0x10e737df0>

So calling fibonacci() returns this “generator object”. To get Fibonacci numbers out of it, you have to do something like iterate it:

for n in fibonacci():
    print(n)

But maybe don’t actually do that, because you may have noticed there’s no return statement in this function, so it will actually run forever. Yet somehow it’s also causing values to be output. What’s up with that?

Let’s try a slightly different approach:

>>> f = fibonacci()
>>> print(next(f))
0
>>> print(next(f))
1
>>> print(next(f))
1
>>> print(next(f))
2

This function is weird. And it’s weird in a very specific and important way.

Most Python functions, like the fizzbuzz example above, just run until they encounter a return statement (or the end of the function body, which Python treats as an implicit return None if no other return statement was executed), and then are done. In fact, this is how most functions in most programming languages work. But some functions, like fibonacci() above, can do a special trick where they can suspend their execution, perhaps emitting a value when they do so, and then resume later from where they left off.

The general term for this special kind of function is coroutine, and a Python generator is a kind of coroutine.

Specifically, any time Python sees a yield statement in a function body, it treats that as an instruction to suspend execution of that function at that point, emit whatever value was specified with the yield (if any was — you can just do a bare yield), and then wait until something specifically tells it to resume execution. Then it will run until it encounters a another yield, or a return statement (or the implicit return None from reaching the end of the function body). This is what Python does automatically for you when you iterate over a generator. It also gracefully handles the situation where the generator eventually stops for good.

Generator pedantics

To be super-precise, the fibonacci function above is a generator function which returns a generator iterator, and next() is Python’s built-in function for manually advancing through any sort of iterable object, not just generators. For iterables which eventually halt, an exception called StopIteration is raised to tell you to, well, stop iterating.

More complex coroutines

While it may seem like the only use for all this is to do weird iteration tricks, there’s a whole world of neat stuff you can do once you have the concept of coroutines, even ones as limited as Python’s generators initially were.

And generators grew quite a bit of additional functionality after they were originally added. The full protocol is:

The special method __next__() — which is what gets invoked by the Python built-in next() — is called to start a generator, or resume a suspended one.
A generator object has a method called send() which allows sending a value into a suspended generator. Inside the generator it’s legal to use yield on the right-hand side of an assignment (like: some_variable = yield some_other_variable) to not only emit a value but also capture a value coming in from send().
A generator object has a method called throw() which allows you to send an exception and have it be raised from the generator.
A generator object has a method called close() which lets you shut down a generator, rather than merely suspend it.

To see why this is useful, imagine a function that needs to load the contents of a file. That involves sending off a request to the operating system and ultimately to your computer’s file system software asking to go find where that file is located on disk and read all the data into memory, and finally signal back to your program that the data’s there and ready for you to work with.

Normally your program would just hang for a moment while all that was happening. During that time your program is blocked, unable to do other work, and so we call this blocking I/O (“I/O” being an abbreviation for “Input/Output”).

But suppose we could write a function that sends off the request to have that file read into memory and then immediately does a yield and suspends execution. Your program could maybe do other things for a bit, occasionally trying to resume that suspended function, or maybe even send() data into it if available. And if it’s not ready to start running again, the function could immediately yield again. But eventually it would resume and carry on with its work.

Wouldn’t that be neat?

This is the idea behind non-blocking I/O. You have a bunch of functions that do all the things you’re used to — reading/writing files, reading/writing network sockets, and so on — but instead of holding up your whole program while they wait for those operations to complete, they can suspend their execution and resume later, letting your program get on with other useful work in the meantime.

And this is a form of asynchronous or “async” programming. One way to see why is that, once you have everything able to do non-blocking I/O, it’s no longer the case that functions execute exactly in the order they were called — some might pause and wait for data from outside sources, while others keep right on going.

Everything and the kitchen async

To make this fully work, we need a couple of building blocks. One of them — coroutines, which are functions that can suspend and resume their execution — we already had in Python via generators and theyield keyword, and if you were really adventurous you could build useful but complex coroutines with generators and their associated methods.

Unfortunately, this was pretty tedious, and there was also no obvious way, short of fully inspecting its code, to tell if any given function was a coroutine.

In Python 3.4 the asyncio module was added, with its API marked as “provisional” (meaning it didn’t come with a backwards-compatibility guarantee and was expected to evolve), and provided utilities for non-blocking/asynchronous I/O operations, as well as for marking functions explicitly as coroutines via the asyncio.coroutine decorator.

That was still pretty cumbersome, though, because it was still built around the generator protocol, and so over the course of the next few Python releases, some new syntax and other features were added to support what’s generally called async/await.

The basic ideas here are:

Generators still exist, and still are coroutines in the sense that they can suspend and resume.
But a new type of coroutine, called a “native coroutine”, and other asynchronous blocks of code are now indicated explicitly by the async keyword. For example, instead of writing def some_function, you now can write async def some_coroutine and Python — and other tooling built around it — will know that you’re defining a coroutine. There’s also an async for construct in looping, an async with construct, and so on.
Points where execution might suspend or resume inside a coroutine are marked with the await keyword. Any time you want to call a coroutine and not proceed until it’s done its thing (say, because it’s reading some data from elsewhere and you need the data before you do anything else), you say so: instead of data = some_blocking_read_function(), you can write data = await some_nonblocking_read_coroutine().

The await expression expects to be handed an “awaitable” object, which is an object with methods __await__(), send(), throw(), and close(). This interface should look familiar, because it’s almost identical to the generator protocol.

The other main piece we need is something that can run coroutines. In the original days of yield-based coroutines, you’d just call a function, and if it had a yield in it you’d get back a generator object that you had to manually work with. The modern Python async def coroutines similarly don’t return values immediately, and require some manual bookkeeping and calling of coroutine-protocol methods to make it all work.

So what you basically always want to do is hand that tedious work off to some code designed to do it. In the case of async Python, that’s a piece of code known as an event loop. The asyncio module added back in Python 3.4 provides an implementation of an event loop, and both high- and low-level APIs for working with it. The basic idea is that you hand the event loop a bunch of coroutines, and it does the work of actually running them and keeping track of what they’re all doing and poking and prodding them along as needed. This saves you from having to do that all yourself. Even if you don’t see it — usually because you’re using a framework that sets it up automatically for you — every time you write an async Python program or application, it has an event loop going on somewhere.

But this means that now there are, effectively, two separate worlds of Python code:

The standard synchronous world that’s existed since the beginning (functions defined with plain def, etc.), which just runs “as-is”.
The new asynchronous world with async def and the other async variants of standard syntax and constructs, which has to run in an event loop.

And they don’t always mix well. The general rule is that you can call synchronous code from asynchronous code (for example: you can call a plain def function inside of an async def one), but not the other way around (calling an async def inside a plain def doesn’t go so well, and using the await keyword inside a plain def function is actually a syntax error). But even if you can call one from the other, it’s often not a good idea to do so.

This is sometimes referred to as the function color problem, from the essay “What Color is Your Function?”, which compared async code in various languages to having two different “colors” of functions and rules about which “color” can call or be called by another.

What async Python is good for

Some types of programs naturally benefit from being written in async Python. For example, imagine an old-school web server that has a directory tree of HTML files it serves up in response to HTTP requests. Having the ability to pause/resume execution when waiting for a file to be read from disk, or while waiting for incoming connections or data on a network socket, is extremely useful there; traditionally, web server implementations had to spin up one worker thread or process (depending on the concurrency model in use) for each request, but async Python allows a large number of paused request handlers to coexist all in a single event loop. This in turn leads to more efficient use of resources, since fewer processes/threads have to be launched to handle requests.

Async is also useful for implementing protocols like WebSockets, where sometimes you’re sending data, sometimes you’re waiting for data from the other side of the connection, and sometimes you’re waiting for an event on your side that will trigger sending some data down the socket. The ability to suspend/resume is ideal for this.

In general, any use case where code often is waiting on I/O operations, or otherwise sitting idle and waiting for something to happen and “wake it up” can benefit from async.

What async Python isn’t

However, async isn’t a magical performance boost. Just going through an existing Python program and changing every def to async def is unlikely to make the program much faster. The benefits of switching to async really only show up in the kinds of cases I mentioned above.

So in a web application, if you’re doing something like WebSockets, sure, there’s a pretty clear advantage from implementing with coroutines. Same with other situations where there are long-lived and/or two-way connections involved. But what about a traditional HTTP application that talks to a database and then sends back some JSON or some HTML in its response? Well, now the benefits aren’t so clear.

For example, in most backend web applications, the database is the main limiting factor: handling an HTTP request typically does one or more queries, and the application can’t return a response until those queries are done and have provided the relevant data. And switching all the HTTP handler functions to async doesn’t really change that. Suppose, for sake of a simple round number, that your database can handle 1,000 requests’ worth of queries per second; switching to async and accepting 10,000 requests as suspended coroutines isn’t going to magically make your database be able to handle the extra load, so you’re still going to average about 1,000 requests handled per second. The other 9,000 are going to be idly awaiting their turn at the database. You might even see your performance go down depending on how you measure it; you can easily be faster to initially accept an incoming request, but slower to send a response (since more accepted requests are piled up waiting for the DB).

Also, async Python still has the Global Interpreter Lock (“GIL”), meaning that no matter how many threads you spin up, only one at a time can be executing Python bytecode or using the Python interpreter’s C API. The main thing async gets you, with respect to the GIL, is actually that you don’t need to turn to threads as early/often as in traditional synchronous programming models, since async lets you have a bunch of coroutines in a single thread’s event loop.

What’s going on with Django

Django has been gradually adding async support for quite some time. The most basic work was ASGI, the asynchronous server gateway interface for Python, which is now used by many more things than Django but has been led mainly by a couple of Django folks, primarily Andrew Godwin. ASGI provides the basic interface between the web server and an async Python application, much as the older WSGI protocol provides the interface for synchronous Python.

You can read the async overview in the Django documentation for full details, but a general timeline is:

Django 3.0 added support for running Django as an ASGI application
Django 3.1 added support for async function-based views, async middleware, and async tests
Django 4.0 added support for async cache backends
Django 4.1, just released, added support for async class-based views, and async wrappers for the ORM (though the actual underlying code remains synchronous for now)

Django also is aware of the context in which it’s running, and can make some kinds of async and synchronous code play nicely together, at the cost of some performance (as noted in the documentation). There are also still async-unsafe parts of Django which always run in synchronous style, and will raise an exception if you try to force them to run in an async context.

There’s also the Channels project, which is an add-on for Django that provides support for web protocols other than HTTP, including ones that benefit from async implementations.

What’s going on around the Python web world

Flask is the other traditionally hugely popular Python web framework (Flask generally being the choice of people who prefer a microframework approach, while Django is full-stack). And it also has async support, though as far as I’m aware Flask cannot run as an ASGI application, and does async via setting up an event loop inside a traditional WSGI app. There’s also the associated Quart project, which is an ASGI framework that’s API-compatible with Flask — you can change imports to be from quart, and change your handlers to async def/await, and it’ll work.

SQLAlchemy, which is generally what people use for DB/ORM when using any framework other than Django (people who are using Django tend to use Django’s ORM), has async support. There are also database tools like the Databases project and the SQLModel ORM which are growing in popularity as accompaniments to async Python web applications, plus a growing suite of async DB driver modules, and other web-oriented or web-adjacent async libraries (like HTTPX as a mostly drop-in replacement for requests, but with async support). And, of course, there are ASGI server implementations, like Daphne, Hypercorn, and Uvicorn.

And there’s a new generation of async-first, or at least async-strongly-preferred, frameworks. The general core is Starlette as the base ASGI toolkit (similar to Werkzeug‘s role underlying several WSGI frameworks), and often Pydantic filling the role that Marshmallow or, in the Django world, DRF serializers, have traditionally played in describing the shape/types of input and output data structures and their validation rules.

Currently FastAPI is the web framework that seems to have the most interest/noise around it, though there are others growing in popularity. So far, it doesn’t feel to me like anything’s reached or come close to the sort of Flask/Django scale of adoption and default-choice status among the async frameworks; it’s still very early days for many of them and they’re generally evolving pretty rapidly.

What I think, so far

I’ve been a bit skeptical of the massive rush to async-ify everything in the Python web world. A lot of it seems to misunderstand what async is good at/good for and is being done just because it’s the new thing everyone’s rushing to do, or because people have unrealistic expectations about what it’ll do for performance. So although I’ve been paying attention to what’s going on I’ve been somewhat slow to adopt async Python or any of the new frameworks/libraries.

It’s also still not always perfectly easy or obvious how to adopt and get the most out of async Python, even once you know it’s the right choice for your use case; Lynn Root’s series of posts and talks on asyncio is highly recommended reading for this.

For thoughts on web applications specifically, I’ve generally pointed people to this old post by Mike Bayer, lead developer of SQLAlchemy, which predates the release of Python 3.5 (the initial version with async and await, though asyncio was already in Python 3.4) by about half a year, but which I still broadly agree with. It’s a long and worthwhile read, but his short summary at the beginning is worth listening to:

I still think that asynchronous programming is just one potential approach to have on the shelf, and is by no means the one we should be using all the time or even most of the time, unless we are writing HTTP or chat servers or other applications that specifically need to concurrently maintain large numbers of arbitrarily slow or idle TCP connections (where by “arbitrarily” we mean, we don’t care if individual connections are slow, fast, or idle, throughput can be maintained regardless).

At work I’m currently juggling two projects: one is a pretty hefty service that really wants the support a full-stack framework can lend. So it’s being implemented with Django (and Django REST framework), using traditional synchronous Python. The other is more of a lightweight read-only wrapper around a set of DB tables, and traditionally Flask has been the right choice for that (it probably still is!). But I decided to use that project to actually get some experience with the new generation of Python frameworks and, after some experiments and tests, ended up on Starlite (note: lite, not lette), with async SQLAlchemy. Of the current crop of async-oriented frameworks, I think it’s my clear favorite.

Overall, the async web stuff currently feels to me like the Python web world did in the mid-to-late 2000s: a sort of wild tangle of new things constantly popping up, sharing and borrowing ideas from each other, and rapidly changing in response to painful encounters with actual use, while the big established traditional frameworks were still there, still very solid choices, and figuring out how they’d evolve in response to what everyone else was doing.

I mentioned that there’s no clear Flask- or Django-type winner so far, but it also feels like most things are competing to be the async Flask, not the async Django — there are lots of async microframeworks, but I haven’t see anyone really trying to do the full-stack thing yet, or even the kinds of conventions that popped up in the microframework world for how to wire together the standard stack of Flask, SQLAlchemy, Jinja, etc. Which is both incredibly freeing, because it opens up room to experiment do whatever you want, and also a bit tedious, at least to me, because it means I have to think about that stuff for the first time in many years.

I expect, and hope, that within a couple years things will have settled down a bit and the async Python web space will be a bit more clear; probably by then one of the async microframeworks will have achieved enough momentum to just become the default for that niche, and maybe there’ll be a full-stack one competing for Django’s spot. Or maybe Django’s expanding async support will mean that “the async Django” just ends up being Django.

Whether async will become the way to do Python web development, I’m less sure of. On my project with Starlite, I’ve been benchmarking it every step of the way (and also was testing performance back when I was evaluating other options like FastAPI — the decision in favor of Starlite wasn’t purely based on performance, but performance was a factor). I was able to compare the same application logic, performing the same queries and returning the same data structures, in both sync and async implementations, and… async was a bit faster, but the overall increase in performance was a single-digit number of percentage points, because the performance bottleneck was still at the database. Optimizing the queries and then adding caching to avoid hitting the database on every request produced a multi-hundred-percentage-point increase. Sync or async, some things just don’t change.