Python, popular though it is, has a few well-known weaknesses. One of the most well known among serious users of the language is the lack of multicore support for in-process threads. This is because CPython, Python’s standard implementation has a global interpreter lock (often referred to as the GIL). The GIL locks each instance of the interpreter to a single core—a common approach to avoid race conditions in the implementation of language interpreters.

This series of posts will discuss the approach I used for parallel execution in the implementation of the Revrit API. The Revrit API is a tool for reconstructing original Hebrew script from transcribed metadata—but the approach is quite CPU intensive! The difficulty is coming up with an implementation that can utilize multiple cores effectively while still maintaining a responsive web server.

This first post will provide an introduction to the multiprocessing module, especially using Pool instances, as well as the challenges using it with common Python WSGI web frameworks and looking at a way forward with asynchronous HTTP servers.

Then next post will cover asynchronous programming in Python with async and await, and the final post will show how to combine these things.

Parallel execution with processes pools

Python does have a threading module, which offers a form of concurrency. However, Python threads are deceptive to new users coming from a language like Java. Because all threads run on a single CPU core in each instance of the Python interpreter, they do not execute in parallel. Threads in Python are mostly useful for a small number of workers which are I/O bound. CPU intensive tasks will not benefit from this approach.

If one really wants to benefit from multiple cores in Python, the only way to do it is with OS processes. OS processes have some pros and cons because they (normally) do not share memory.

On the plus side, it is impossible to create a data race without shared memory. It’s a much safer, saner way to program, which is why Erlang borrowed this paradigm for its own abstraction on concurrency.
On the minus side, communication and data transfer has to be organized in another way by the programmer—normally through pipes or sockets. This requires defining binary or text serializations for any objects you wish to pass between processes.

Luckily, Python mostly mitigates the downside of having to implement serialization and transports with the multiprocessing module in the standard library. This module defines types and functions for transporting serialized Python objects between processes (and even across the internet, if you need). It defines types for processes, queues, pipes, locks, shared memory and other goodies, but my preferred way to work with processes is as a worker pool. Using multiprocessing.Pool, you create a pool of worker processes. You can send work to the pool, which will distribute it to any available worker, and you can come back to the result later.

1
2
3
4
5
6
7
8


# With no arguments, Pool will create as
# many workers as your system has cores.
pool = multiprocessing.Pool()

result = pool.apply_async(func=str.upper, args=["spam"])
print(result.get())

# SPAM

In this example, we use Pool.apply_async to create an AsyncResult object which represents work being done in a worker process. When we call its .get method, the main process will block until the result is available and return its value.

This abstraction isn’t perfect. Not every type of Python object can be serialized automatically. Functions and classes, for example, are not serialized, but are looked up in the worker process. This is normally fine, since the memory will have been copied from the parent when the pool was created. However, it can be a problem if you are trying to use functions or classes defined after the pool was created. In practice, the main issue with this is that you can’t use closures to create jobs in the pool.

Likewise, while most pure Python objects can be serialized automatically (with some small exceptions), instances of types defined in C as part of native extension modules must explicitly define their serialized data representation, which not all extension modules do.

Of course, serialization and transport of objects also has a cost. It will not be as efficient as the shared memory situation one has with threads in a language with better multicore support. Despite these limitations, the multiprocessing module greatly reduces the burden of traditional multiprocess programming with its helpful, albeit imperfect abstractions for interprocesses communication.

We’ve seen how to use the pool to create jobs one at a time, but one of the best features of the process pool is the way it can abstract over iteration.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


import multiprocessing
import os
import random
import time

def report(n):
    time.sleep(random.random()/2)
    return f"job {n} in process {os.getpid()}"

pool = multiprocessing.Pool()
for string in pool.imap(report, range(1, 11)):
    print(string)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


job 1 in process 216701
job 2 in process 216702
job 3 in process 216703
job 4 in process 216704
job 5 in process 216705
job 6 in process 216706
job 7 in process 216707
job 8 in process 216708
job 9 in process 216702
job 10 in process 216708

The .map and .imap methods will efficiently distribute the work to the pool. If the order doesn’t matter, we can also use .imap_unordered to add a little more efficiency:

1
2


for string in pool.imap_unordered(report, range(1, 11)):
    print(string)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


job 3 in process 216946
job 8 in process 216951
job 7 in process 216950
job 4 in process 216947
job 6 in process 216949
job 2 in process 216945
job 9 in process 216946
job 5 in process 216948
job 1 in process 216944
job 10 in process 216951

These are my favorite abstractions for parallel execution in Python.

Multiprocessing on a Python web backend?

This strategy works well for the CPU-intensive transliteration conversion process on my local machine, but how will it work on a web server?

It depends, but if you’re using the most popular Python web frameworks like Django, Flask or Pyramid, you run into some issues. These frameworks are all implemented in terms of WSGI, a common protocol used for Python backends to communicate with web servers. However, WSGI does not allow the framework to spawn new processes—only the http server is allowed to do that.

One possible solution would be to have the web server communicate with another network service over sockets or another communication system like ZeroMQ, but at this point we’d just be re-implementing what the multiprocessing library gives us for free in terms of communication and serialization.

If you want to save yourself the bother, what you need is a Python web server that does not rely on WSGI. There are a few well-known options: Twisted, Tornado, and AIOHTTP. (There may be more options now. The technical decision was made some time ago.)

The thing all of these webservers have in common is use an event-loop-based architecture to do I/O operations asynchronously. This makes it possible to get maximum throughput on a single thread in a way similar to how nginx or Node.js work.

Event-loop-based web servers were created as large companies needed to be able to serve tens of thousands of connections per second, rather than the hundreds which process and thread-based architectures of older systems could accommodate.

Our use case does not require this kind of density, but it does require something that will allow us to perform CPU-intensive work while still allowing the web server to be responsive. In the end, we went with Tornado because it provides a relatively simple routing system, with great performance. It is often benchmarked as the fastest of all Python HTTP servers, along with a lot of extra industry-tested features.

Continue reading this series for an overview of how asynchronous programming works in Python. The third and final post covers combined asynicio with multiprocessing to create a web service that can keep taking requests even when the CPU is at full capacity.

CPU-intensive Python Web Backends with asyncio and multiprocessing, Part I

Contents

Parallel execution with processes pools

Multiprocessing on a Python web backend?