CPU-intensive Python Web Backends with asyncio and multiprocessing, Part II
In the first post of this series, I looked at how to achieve parallel execution in Python using multiprocessing and discussed how this is unsuitable with WSGI-based web frameworks because WSGI only allows the web server to create new processes, not the framework. At the end, I mentioned several alternative Python HTTP servers which use asynchronous I/O with an event-loop-based scheduler to handle parallelism.
In this post, we will look at how asynchronous I/O works in general, and specifically how it works in Python.
Event loops and asyncio
Event-loop-based concurrency generally works by putting a lot of small tasks into a scheduler that take turns using the CPU and waiting in the queue. A task keeps executing until it hits I/O, at which point it will give control back to the scheduler, along with a file descriptor and some other information for resuming the task. When it is detected that the I/O operation is complete, the task will resume executing where it left off until it completes or hits another I/O operation. This means your thread can keep doing work while it is waiting on I/O, but each task will only use a tiny amount of memory when it’s not executing (relative to a process or an OS thread). It’s a very efficient way to use a single core.
Callbacks, monads and async/await
Traditionally, the way a scheduler resumes a task is with a callback function which would take the result of the I/O operation as input. This works well, but dealing with a lot of deeply nested callbacks is not easy for programmers, and is often referred to as “callback hell”.
Consider the following Python program:
The question is printed, we wait for user input, and we print a greeting.
However, if we need to return control to the scheduler for each I/O operation with callbacks, our program will look like this:
Haskell is a purely functional, lazy programming language. Each program can be thought of as a mathematical expression rather than a list of steps. sub-expressions don’t evaluate until their value is needed (i.e. lazy evaluation). However, the order of evaluation in programs is often important, especially when I/O is involved! How does Haskell make sure I/O happens at the right time?
The answer is monads!
One way to think about monads is as a wrapper around a value. You can only get to the value by giving a callback function to the monad where the unwrapped value is bound to an argument and so available inside the body of that function.
In Python, it might look like this:
We see that this monadic code looks similar to the callback version we
had earlier. However, because this pattern is so prevalant in Haskell,
they created a special syntax called
do notation. This
allows the monadic callback code at the end of the previous example to
be written like this (in a world where Python had
This is just a simple syntax transformation in the parser, but it’s much prettier than callback code.
functional language for .NET, doesn’t use lazy evaluation or need
monads for IO. Still, the F# developers were inspired by Haskell to
use monads to implement asynchronous I/O in a way that looks more like
sequential code than callbacks, and they created
async blocks to
preform a similar syntax transformation to
do notation in Haskell.
Seeing this from the other side of Microsoft, The
developers were inspired and added
await keywords to the
language that signaled to the compiler to convert functions written
sequentially into the same kind of monadic callback code we’ve seen
There is a progression from old-style callback code like in C and
await keywords which signal syntax transformations in
The Python community also wanted a way to write asynchronous code
without explicit callbacks. However, Python got rid of the callbacks
altogether. Python already had resumable functions in the form of
allow efficient transfer of control flow between the calling context
in the function itself at specified breakpoints using the
keyword. While it looks different and it doesn’t use a callback,
yield is providing a similar functionality to the monadic
Python’s coroutines were originally implemented in terms of these
generators (check out this
video to see
how). Today, we also use
await keywords to create
coroutines in Python, but they are implemented in the same way as
generator functions. The
async keyword simply shows that the
function definition which follows returns an awaitable coroutine,
rather than a normal value. The
await keyword is the breakpoint
where control flow will be returned to the scheduler to keep the event
loop chugging away.
This monadic approach to asynchronous I/O has some drawbacks. Only
await, so you get into this situation where older I/O
libraries which don’t use
async are not usable (from a practical
standpoint) with in new async code. You end up in the “What color is
situation. It all gets very annoying when you didn’t await something
you should have or you try to await something that doesn’t work that
way. Still, it is a useful way to retrofit asynchronous I/O into a
language which wasn’t designed for it. Newer languages like
Go have a
simpler way to handle this because the scheduler was baked into the
runtime and all the I/O functions from the beginning.
Still, whether or not
await are the ideal way to solve this
problem, they are better than writing callbacks, and they are what we
have at our disposal in Python and a number of other languages.
So we have these pseudo-monadic
await keywords for
defining coroutines in Python, but what can we do with them? By
themselves, not very much!
Hm. Not great. We defined a
get_greeting coroutine, but nothing
happened because we never
await. Let’s try:
It didn’t work the first time because I didn’t await the coroutine, and in the second attempt, it wouldn’t let me await the coroutine?
Well, that’s because all of this is supposed to be executed in the
context of an event loop. There are several available for Python.
The one distributed with the standard library is in the asyncio
module. There are a
few ways to get the loop running and put things into it. The one
you should start with these days is
asyncio.run, which creates the
loop and runs a coroutine in it, returning the output.
Nice. Let’s see what we can do with this!
Hm. This was slow and didn’t seem to do anything concurrently. Just writing async functions doesn’t make your code automatically concurrent. Yes, it is technically yielding control flow to the scheduler, but it goes right back to where it left off because there is nothing else scheduled.
To bring it all together, we create tasks with
which takes a coroutine as input. When you create the task, it is
scheduled, and it will run concurrently with the other jobs in the
scheduler. You can
await the task later to block in the running
coroutine until the result is ready. (Note that it only blocks in the
current coroutine. The other tasks can still run.) We update our two
… and we get something like this:
All the jobs complete at more or less the same time and the program is fast! If the order of the output doesn’t matter, we can tweak our main function a little more:
asyncio.as_completed makes it so it won’t hold up the order by
waiting on the first task to finish if one after it is already done,
though it doesn’t make a difference in our trivial example.
The weakness of event loops
I’ve said a lot so far about how great using an event loop as a mechanism for concurrency is because of the speed and high density of tasks it allows. However, there is one major drawback. Asynchronous I/O is a form of cooperative multitasking. This means it is always explicit where control flow is given up to the scheduler, and one task never interrupts another to use the CPU. This is in contrast to OS threads, which can interrupt each other while executing.
The third and final post of this series will explore how to combine multiprocessing and asyncio to finally achieve the goal of having a responsive web server together with CPU-intensive parallel execution.
Written by Aaron Christiansson. Last Modified on 2021-10-27.