CPU-intensive Python Web Backends with asyncio and multiprocessing, Part II
Contents
In the first post of this series, I looked at how to achieve parallel execution in Python using multiprocessing and discussed how this is unsuitable with WSGI-based web frameworks because WSGI only allows the web server to create new processes, not the framework. At the end, I mentioned several alternative Python HTTP servers which use asynchronous I/O with an event-loop-based scheduler to handle parallelism.
In this post, we will look at how asynchronous I/O works in general, and specifically how it works in Python.
Event loops and asyncio
Event-loop-based concurrency generally works by putting a lot of small tasks into a scheduler that take turns using the CPU and waiting in the queue. A task keeps executing until it hits I/O, at which point it will give control back to the scheduler, along with a file descriptor and some other information for resuming the task. When it is detected that the I/O operation is complete, the task will resume executing where it left off until it completes or hits another I/O operation. This means your thread can keep doing work while it is waiting on I/O, but each task will only use a tiny amount of memory when it’s not executing (relative to a process or an OS thread). It’s a very efficient way to use a single core.
Callbacks, monads and async/await
Traditionally, the way a scheduler resumes a task is with a callback function which would take the result of the I/O operation as input. This works well, but dealing with a lot of deeply nested callbacks is not easy for programmers, and is often referred to as “callback hell”.
Consider the following Python program:
|
|
The question is printed, we wait for user input, and we print a greeting.
However, if we need to return control to the scheduler for each I/O operation with callbacks, our program will look like this:
|
|
This code is ridiculous in Python, but it is not uncommon for JavaScript which handles web requests to have a similar structure because you don’t want your page to freeze while content loads in the background. It’s also how every I/O function works in Haskell. Why?
Haskell is a purely functional, lazy programming language. Each program can be thought of as a mathematical expression rather than a list of steps. sub-expressions don’t evaluate until their value is needed (i.e. lazy evaluation). However, the order of evaluation in programs is often important, especially when I/O is involved! How does Haskell make sure I/O happens at the right time?
The answer is monads!
One way to think about monads is as a wrapper around a value. You can only get to the value by giving a callback function to the monad where the unwrapped value is bound to an argument and so available inside the body of that function.
In Python, it might look like this:
|
|
We see that this monadic code looks similar to the callback version we
had earlier. However, because this pattern is so prevalant in Haskell,
they created a special syntax called do
notation. This do
notation
allows the monadic callback code at the end of the previous example to
be written like this (in a world where Python had do
notation):
|
|
This is just a simple syntax transformation in the parser, but it’s much prettier than callback code.
F#, a
functional language for .NET, doesn’t use lazy evaluation or need
monads for IO. Still, the F# developers were inspired by Haskell to
use monads to implement asynchronous I/O in a way that looks more like
sequential code than callbacks, and they created async
blocks to
preform a similar syntax transformation to do
notation in Haskell.
Seeing this from the other side of Microsoft, The
C#
developers were inspired and added async
and await
keywords to the
language that signaled to the compiler to convert functions written
sequentially into the same kind of monadic callback code we’ve seen
above.
There is a progression from old-style callback code like in C and
JavaScript to monadic callbacks in Haskell and F#, and finally to
async
and await
keywords which signal syntax transformations in
C#. This approach was adopted in TypeScript and JavaScript as well.
The Python community also wanted a way to write asynchronous code
without explicit callbacks. However, Python got rid of the callbacks
altogether. Python already had resumable functions in the form of
generators. Generators
allow efficient transfer of control flow between the calling context
in the function itself at specified breakpoints using the yield
keyword. While it looks different and it doesn’t use a callback,
yield
is providing a similar functionality to the monadic bind
function.
Python’s coroutines were originally implemented in terms of these
generators (check out this
video to see
how). Today, we also use async
and await
keywords to create
coroutines in Python, but they are implemented in the same way as
generator functions. The async
keyword simply shows that the
function definition which follows returns an awaitable coroutine,
rather than a normal value. The await
keyword is the breakpoint
where control flow will be returned to the scheduler to keep the event
loop chugging away.
This monadic approach to asynchronous I/O has some drawbacks. Only
coroutines can await
, so you get into this situation where older I/O
libraries which don’t use async
are not usable (from a practical
standpoint) with in new async code. You end up in the “What color is
your
function?”
situation. It all gets very annoying when you didn’t await something
you should have or you try to await something that doesn’t work that
way. Still, it is a useful way to retrofit asynchronous I/O into a
language which wasn’t designed for it. Newer languages like
Go have a
simpler way to handle this because the scheduler was baked into the
runtime and all the I/O functions from the beginning.
Still, whether or not async
and await
are the ideal way to solve this
problem, they are better than writing callbacks, and they are what we
have at our disposal in Python and a number of other languages.
Using asyncio
So we have these pseudo-monadic async
and await
keywords for
defining coroutines in Python, but what can we do with them? By
themselves, not very much!
|
|
|
|
Hm. Not great. We defined a get_greeting
coroutine, but nothing
happened because we never await
. Let’s try:
|
|
|
|
It didn’t work the first time because I didn’t await the coroutine, and in the second attempt, it wouldn’t let me await the coroutine?
Well, that’s because all of this is supposed to be executed in the
context of an event loop. There are several available for Python.
The one distributed with the standard library is in the asyncio
module. There are a
few ways to get the loop running and put things into it. The one
you should start with these days is asyncio.run
, which creates the
loop and runs a coroutine in it, returning the output.
|
|
Nice. Let’s see what we can do with this!
|
|
|
|
Hm. This was slow and didn’t seem to do anything concurrently. Just writing async functions doesn’t make your code automatically concurrent. Yes, it is technically yielding control flow to the scheduler, but it goes right back to where it left off because there is nothing else scheduled.
To bring it all together, we create tasks with asyncio.create_task
,
which takes a coroutine as input. When you create the task, it is
scheduled, and it will run concurrently with the other jobs in the
scheduler. You can await
the task later to block in the running
coroutine until the result is ready. (Note that it only blocks in the
current coroutine. The other tasks can still run.) We update our two
functions accordingly…
|
|
… and we get something like this:
|
|
All the jobs complete at more or less the same time and the program is fast! If the order of the output doesn’t matter, we can tweak our main function a little more:
|
|
Using asyncio.as_completed
makes it so it won’t hold up the order by
waiting on the first task to finish if one after it is already done,
though it doesn’t make a difference in our trivial example.
The weakness of event loops
I’ve said a lot so far about how great using an event loop as a mechanism for concurrency is because of the speed and high density of tasks it allows. However, there is one major drawback. Asynchronous I/O is a form of cooperative multitasking. This means it is always explicit where control flow is given up to the scheduler, and one task never interrupts another to use the CPU. This is in contrast to OS threads, which can interrupt each other while executing.
The consequence of this in a language like Python or JavaScript where execution is always on a single core is that, if you get stuck in a point where control is never returned to the scheduler, everything else has to wait on it. This is especially a problem if you’re doing CPU-intensive work—exactly the case with my Revrit API!
The third and final post of this series will explore how to combine multiprocessing and asyncio to finally achieve the goal of having a responsive web server together with CPU-intensive parallel execution.
Last Modified on 2021-10-27.