docs: Redesign architecture page with visual components

Add tabbed worker types, comparison table, decision guide admonitions,
and scaling callouts. Rename master to arbiter throughout.
This commit is contained in:
Benoit Chesneau 2026-01-23 00:00:17 +01:00
parent 571bc121d1
commit c959daeb82

View File

@ -3,81 +3,201 @@
A brief look at Gunicorn's architecture.
## Server model
## Server Model
Gunicorn uses a pre-fork worker model: a master process manages worker
processes, while the workers handle requests and responses. The master never
Gunicorn uses a **pre-fork worker model**: an arbiter process manages worker
processes, while the workers handle requests and responses. The arbiter never
touches individual client sockets.
### Master
<div class="pillars" markdown>
The master process listens for signals (TTIN, TTOU, CHLD, etc.) and adjusts the
worker pool accordingly. `TTIN`/`TTOU` change the number of workers; `CHLD`
indicates a worker exited and must be restarted.
<div class="pillar" markdown>
<div class="pillar__icon">⚖️</div>
### Sync workers
### Arbiter
The default `sync` worker handles one request at a time. Errors affect only the
current request. Because connections close after each response, persistent
connections are not supported even if you set `Keep-Alive` headers manually.
Orchestrates the worker pool. Listens for signals (`TTIN`, `TTOU`, `CHLD`,
`HUP`) to adjust workers, restart them on failure, or reload configuration.
</div>
### Async workers
<div class="pillar" markdown>
<div class="pillar__icon">⚙️</div>
Async workers are powered by [greenlets](https://github.com/python-greenlet/greenlet)
through [Eventlet](http://eventlet.net/) or [Gevent](http://www.gevent.org/).
Most apps work without modification, though full compatibility may require
patches (for example installing [`psycogreen`](https://github.com/psycopg/psycogreen/)
when using [Psycopg](http://initd.org/psycopg/)). Some apps that depend on the
original blocking behaviour may not be compatible.
### Worker Pool
### Gthread workers
Each worker handles requests independently. Worker types determine
concurrency model: sync, threaded, or async via greenlets/asyncio.
</div>
`gthread` is a threaded worker. The main loop accepts connections and places
them in a thread pool. Keep-alive connections return to the pool to await
further events; idle connections close after the keepalive timeout.
<div class="pillar" markdown>
<div class="pillar__icon">📡</div>
### Tornado workers
### Signal Communication
A Tornado worker class exists for Tornado-based applications. While it can
serve WSGI apps, this configuration is not recommended.
`TTIN`/`TTOU` adjust worker count. `CHLD` triggers restart of crashed
workers. `HUP` reloads configuration. See [Signals](signals.md).
</div>
</div>
## Worker Types
Choose a worker type based on your application's needs.
=== "Sync"
The **default** worker. Handles one request at a time per worker.
- Simple and predictable
- Errors affect only the current request
- No keep-alive support (connections close after response)
- Requires a buffering proxy (nginx, HAProxy) for production
```bash
gunicorn myapp:app
```
=== "Gthread"
Threaded worker with a **thread pool** per worker process.
- Supports keep-alive connections
- Good balance of concurrency and simplicity
- Threads share memory (lower footprint than workers)
- Idle connections close after keepalive timeout
```bash
gunicorn myapp:app -k gthread --threads 4
```
=== "Gevent"
**Greenlet-based** async worker using [Gevent](http://www.gevent.org/).
- Handles thousands of concurrent connections
- Supports keep-alive, WebSockets, long-polling
- May require patches for some libraries (e.g., `psycogreen` for Psycopg)
- Not compatible with code that relies on blocking behavior
```bash
gunicorn myapp:app -k gevent --worker-connections 1000
```
=== "Eventlet"
**Greenlet-based** async worker using [Eventlet](http://eventlet.net/).
- Similar capabilities to Gevent
- Handles high concurrency for I/O-bound apps
- Some libraries may need compatibility patches
```bash
gunicorn myapp:app -k eventlet --worker-connections 1000
```
=== "ASGI"
Native **asyncio** support for modern async frameworks.
- For FastAPI, Starlette, Quart, and other ASGI apps
- Full async/await support
- See the [ASGI Guide](asgi.md) for details
```bash
gunicorn myapp:app -k uvicorn.workers.UvicornWorker
```
## Comparison
| Worker | Concurrency Model | Keep-Alive | Best For |
|--------|-------------------|------------|----------|
| `sync` | 1 request/worker | ❌ | CPU-bound apps behind a proxy |
| `gthread` | Thread pool | ✅ | Mixed workloads, moderate concurrency |
| `gevent` | Greenlets | ✅ | I/O-bound, WebSockets, streaming |
| `eventlet` | Greenlets | ✅ | I/O-bound, long-polling |
| ASGI workers | AsyncIO | ✅ | Modern async frameworks (FastAPI, etc.) |
!!! tip "Quick Decision Guide"
- **Simple app behind nginx?**`sync` (default)
- **Need keep-alive or moderate concurrency?**`gthread`
- **WebSockets, streaming, long-polling?**`gevent` or `eventlet`
- **FastAPI, Starlette, or async framework?** → ASGI worker
## When to Use Async Workers
Synchronous workers assume your app is CPU or network bound and avoids
indefinite blocking operations. Use async workers when you have:
- Long blocking calls (external APIs, slow databases)
- Direct internet traffic without a buffering proxy
- Streaming request/response bodies
- Long polling or Comet patterns
- WebSockets
!!! info "Testing Slow Clients"
Tools like [Hey](https://github.com/rakyll/hey) can simulate slow responses
to test how your configuration handles them.
## Scaling
### How Many Workers?
!!! warning "Don't Over-Scale"
Workers ≠ clients. Gunicorn typically needs only **412 workers** to handle
heavy traffic. Too many workers waste resources and can reduce throughput.
Start with this formula and adjust under load:
```
workers = (2 × CPU cores) + 1
```
Use `TTIN`/`TTOU` signals to adjust the worker count at runtime.
### How Many Threads?
With the `gthread` worker, you can combine workers and threads:
```bash
gunicorn myapp:app -k gthread --workers 4 --threads 2
```
!!! info "Threads vs Workers"
- **Threads** share memory → lower footprint
- **Workers** isolate failures → better fault tolerance
- Combine both for the best of both worlds
Threads can extend request time beyond the worker timeout while still
notifying the arbiter. The optimal mix depends on your runtime (CPython vs
PyPy) and workload.
## Configuration Examples
```bash
# Sync (default) - simple apps behind nginx
gunicorn myapp:app
# Gthread - keep-alive and thread concurrency
gunicorn myapp:app -k gthread --workers 4 --threads 4
# Gevent - high concurrency for I/O-bound apps
gunicorn myapp:app -k gevent --workers 4 --worker-connections 1000
# Eventlet - alternative async worker
gunicorn myapp:app -k eventlet --workers 4 --worker-connections 1000
# ASGI - FastAPI/Starlette with Uvicorn worker
gunicorn myapp:app -k uvicorn.workers.UvicornWorker --workers 4
```
<span id="asyncio-workers"></span>
### AsyncIO workers
Use third-party workers to pair Gunicorn with asyncio frameworks (see the
[aiohttp deployment guide](https://docs.aiohttp.org/en/stable/deployment.html#nginx-gunicorn)
or the [Flask aiohttp example](https://github.com/benoitc/gunicorn/blob/master/examples/frameworks/flaskapp_aiohttp_wsgi.py)).
!!! note "Third-Party AsyncIO Workers"
## Choosing a worker type
Synchronous workers assume your app is CPU/network bound and avoids indefinite
operations. Any outbound HTTP calls or other blocking behaviour benefit from an
async worker. Because synchronous workers are vulnerable to slow clients,
Gunicorn requires a buffering proxy in front of the default configuration. Tools
like [Hey](https://github.com/rakyll/hey) can simulate slow responses to test
this scenario.
Examples that need async workers:
- Long blocking calls (outbound web services)
- Direct internet traffic (no buffering proxy)
- Streaming request/response bodies
- Long polling
- WebSockets / Comet
## How many workers?
Do **not** scale workers to match client count. Gunicorn usually needs only 412
workers to handle heavy traffic. Start with `(2 * num_cores) + 1` and adjust
under load using `TTIN`/`TTOU`.
Too many workers waste resources and can reduce throughput.
## How many threads?
Since Gunicorn 19 you can set `--threads` (with the `gthread` worker) to process
requests concurrently. Threads can extend request time beyond the worker
timeout while still notifying the master. The optimal mix of threads and worker
processes depends on the runtime (for example CPython vs. Jython). Threads share
memory, lowering footprint, and still allow reloads because application code is
loaded in worker processes.
For asyncio frameworks, you can also use third-party workers. See the
[aiohttp deployment guide](https://docs.aiohttp.org/en/stable/deployment.html#nginx-gunicorn)
for examples.