Design

A brief look at Gunicorn's architecture.

Server model

Gunicorn uses a pre-fork worker model: a master process manages worker processes, while the workers handle requests and responses. The master never touches individual client sockets.

Master

The master process listens for signals (TTIN, TTOU, CHLD, etc.) and adjusts the worker pool accordingly. TTIN/TTOU change the number of workers; CHLD indicates a worker exited and must be restarted.

Sync workers

The default sync worker handles one request at a time. Errors affect only the current request. Because connections close after each response, persistent connections are not supported even if you set Keep-Alive headers manually.

Async workers

Async workers are powered by greenlets through Eventlet or Gevent. Most apps work without modification, though full compatibility may require patches (for example installing psycogreen when using Psycopg). Some apps that depend on the original blocking behaviour may not be compatible.

Gthread workers

gthread is a threaded worker. The main loop accepts connections and places them in a thread pool. Keep-alive connections return to the pool to await further events; idle connections close after the keepalive timeout.

Tornado workers

A Tornado worker class exists for Tornado-based applications. While it can serve WSGI apps, this configuration is not recommended.

AsyncIO workers

Use third-party workers to pair Gunicorn with asyncio frameworks (see the aiohttp deployment guide or the Flask aiohttp example).

Choosing a worker type

Synchronous workers assume your app is CPU/network bound and avoids indefinite operations. Any outbound HTTP calls or other blocking behaviour benefit from an async worker. Because synchronous workers are vulnerable to slow clients, Gunicorn requires a buffering proxy in front of the default configuration. Tools like Hey can simulate slow responses to test this scenario.

Examples that need async workers:

Long blocking calls (outbound web services)
Direct internet traffic (no buffering proxy)
Streaming request/response bodies
Long polling
WebSockets / Comet

How many workers?

Do not scale workers to match client count. Gunicorn usually needs only 4–12 workers to handle heavy traffic. Start with (2 * num_cores) + 1 and adjust under load using TTIN/TTOU.

Too many workers waste resources and can reduce throughput.

How many threads?

Since Gunicorn 19 you can set --threads (with the gthread worker) to process requests concurrently. Threads can extend request time beyond the worker timeout while still notifying the master. The optimal mix of threads and worker processes depends on the runtime (for example CPython vs. Jython). Threads share memory, lowering footprint, and still allow reloads because application code is loaded in worker processes.

3.2 KiB Raw Blame History Unescape Escape