276 Commits

Author SHA1 Message Date
Benoit Chesneau
709a6ad159
feat(dirty): add stash - global shared state between workers (#3503)
* feat(dirty): add stash - global shared state between workers

Add a simple key-value store (stash) that allows dirty workers to share
state through the arbiter. Tables are stored directly in arbiter memory
for fast access and simplicity.

Features:
- Auto-create tables on first access
- Dict-like interface via stash.table()
- Pattern matching for keys (glob patterns)
- Module-level API: stash.put(), stash.get(), stash.delete(), etc.

Usage:
    from gunicorn.dirty import stash

    stash.put("sessions", "user:1", {"name": "Alice"})
    user = stash.get("sessions", "user:1")

    # Or dict-like
    sessions = stash.table("sessions")
    sessions["user:1"] = {"name": "Alice"}

New files:
- gunicorn/dirty/stash.py - Client API and StashTable class
- Protocol additions for MSG_TYPE_STASH and STASH_OP_* codes

Note: Tables are ephemeral - lost if arbiter restarts.

* test(dirty): add tests for stash protocol and encoding

Test coverage for:
- Stash message creation and encoding
- Protocol constants (MSG_TYPE_STASH, STASH_OP_*)
- Error classes (StashError, StashTableNotFoundError, StashKeyNotFoundError)
- StashTable dict-like interface
- Edge cases: unicode, complex values, special patterns

* example(dirty): add stash usage example and integration tests

- Add SessionApp to dirty_app.py demonstrating stash usage
- Add /session/* endpoints to wsgi_app.py
- Add test_stash_integration.py with Docker tests
- Update docker-compose.yml with stash-test service
- Fix: Set GUNICORN_DIRTY_SOCKET in dirty arbiter for worker access

* docs(dirty): add stash documentation
2026-02-12 21:45:49 +01:00
Benoit Chesneau
68ce658f5d fix(dirty): convert dict int keys to strings in TLV encoder
JSON serializes all dict keys as strings, so for compatibility the TLV
encoder should do the same. This fixes an error when tasks return dicts
with integer keys (e.g., aggregation results grouped by numeric ID).
2026-02-11 23:39:53 +01:00
Benoit Chesneau
477b7479cc feat(dirty): update client for binary protocol
Update client and streaming tests to work with the binary protocol:
- Update MockStreamWriter/MockStreamReader to use BinaryProtocol
- Replace string request IDs with integers
- Update test assertions to decode binary protocol messages
- Use HEADER_SIZE and decode_header/decode_message instead of old API
2026-02-11 23:12:44 +01:00
Benoit Chesneau
98b1b649c2 feat(dirty): update arbiter for binary protocol
Update arbiter tests to work with the binary protocol:
- Update MockStreamWriter to decode binary messages
- Import binary protocol constants from module level
2026-02-11 23:03:40 +01:00
Benoit Chesneau
6d2139bb6c feat(dirty): update worker for binary protocol
Update worker tests to work with the binary protocol:
- Use integer request IDs instead of strings
- Update MockStreamWriter to decode binary messages
- Import binary protocol constants from module level
2026-02-11 23:01:21 +01:00
Benoit Chesneau
1665857c0e feat(dirty): implement binary protocol
Replace JSON-based protocol with binary format using 16-byte header:
- Magic bytes (GD), version, message type, payload length, request ID
- TLV-encoded payloads for efficient binary data transfer
- No base64 encoding needed for binary data
- Backwards compatible API (DirtyProtocol alias, dict-based interface)

Header format inspired by OpenBSD msgctl/msgsnd.
2026-02-11 22:58:43 +01:00
Benoit Chesneau
0e0dc669c8 feat(dirty): add TLV binary encoder/decoder
Implement TLV (Type-Length-Value) serialization layer for the binary
dirty worker protocol. This enables efficient binary data transfer
without base64 encoding overhead.

Supported types:
- None, bool, int64, float64
- bytes (raw binary, no encoding needed)
- string (UTF-8)
- list, dict (nested structures)

Inspired by OpenBSD msgctl/msgsnd message format.
2026-02-11 22:55:03 +01:00
Benoit Chesneau
9508df658d test: increase CI timeout for signal tests on PyPy 2026-02-06 09:00:29 +01:00
Benoit Chesneau
95b7ffeeaa chore: prepare release 25.0.2
- Bump version to 25.0.2
- Update copyright year to 2026 in LICENSE and NOTICE
- Add license headers to all Python source files
- Add changelog entry for 25.0.2
2026-02-06 08:21:18 +01:00
Benoit Chesneau
e780508f56 fix: resolve ASGI concurrent request failures through nginx proxy
- Fix nginx config to use keepalive with upstream (was sending
  Connection: close which caused premature connection closure)
- Add _safe_write() to handle socket errors (EPIPE, ECONNRESET,
  ENOTCONN) gracefully when client disconnects
- Fix ASGI scope server/client to always be 2-tuples for IPv6
  compatibility (IPv6 sockets return 4-tuples)
- Add write_eof() before close() to ensure buffered data is flushed
- Bind to [::] for dual-stack IPv4/IPv6 support in test containers
2026-02-06 01:57:28 +01:00
Benoit Chesneau
866e88cfd6
Merge pull request #3485 from benoitc/fix/asgi-graceful-disconnect
fix: graceful disconnect handling for ASGI worker
2026-02-03 09:30:19 +01:00
Benoit Chesneau
3bf718ea52 fix: graceful disconnect handling for ASGI worker
Closes #3484

When a client disconnects during an ASGI request, the worker now:
1. Sends http.disconnect message to the app's receive queue
2. Allows a configurable grace period for cleanup (default: 3 seconds)
3. Only cancels the task after the grace period expires

This follows the ASGI HTTP Connection Scope spec which defines
http.disconnect as the message apps should receive when clients
disconnect: https://asgi.readthedocs.io/en/latest/specs/www.html#disconnect-receive-event

The grace period prevents CancelledError from propagating to async
database operations, allowing SQLAlchemy and other async DB libraries
to properly reset their connection pools.

New config option: --asgi-disconnect-grace-period (default: 3 seconds)
2026-02-03 02:46:07 +01:00
Benoit Chesneau
b19c648a67 fix: lazy import dirty module for gevent compatibility
Closes #3482

The dirty module (which uses asyncio and concurrent.futures) was being
imported at gunicorn startup via gunicorn.arbiter. This caused
concurrent.futures to be imported before user code could call
gevent.monkey.patch_all(), breaking gevent's monkey-patching.

Changes:
- gunicorn/arbiter.py: Import DirtyArbiter and set_dirty_socket_path
  lazily inside spawn_dirty_arbiter() instead of at module level
- gunicorn/dirty/worker.py: Import ThreadPoolExecutor lazily inside
  run() method instead of at module level
- Add tests/workers/test_gevent_import_order.py with 5 tests verifying:
  - concurrent.futures is NOT imported when gunicorn.arbiter loads
  - gevent patching works correctly with gunicorn
  - Reproduces the exact scenario from the bug report gist

This ensures gevent's monkey.patch_all() can run before concurrent.futures
is imported, allowing proper patching of threading primitives.
2026-02-03 01:15:39 +01:00
Benoit Chesneau
0885005b08 fix(tests): correct assertions in ASGI compliance tests
- Fix path expectation in test_scope_path_preserved (router strips /http prefix)
- Fix lifespan state check to use scope_state instead of module_state
- Add tolerance for partial failures in proxy concurrent test
- Add retry logic with proper assertions in HTTPS proxy FastAPI test
2026-02-02 14:04:26 +01:00
Benoit Chesneau
1d0df29796 feat(dirty): add class attribute workers support and e2e tests
- Add get_app_workers_attribute() to read workers class attribute
- Update _parse_app_specs() to check class attribute when no config override
- Add Docker-based e2e tests for per-app worker allocation
- Add test apps: HeavyModelApp (workers=2), LightweightApp
- Add unit tests for get_app_workers_attribute function
- Add integration tests for class attribute detection
2026-02-01 03:04:35 +01:00
Benoit Chesneau
8559854b4f feat(dirty): add per-app worker allocation for memory optimization
Allow dirty apps to specify how many workers should load them, enabling
significant memory savings for heavy applications like ML models.

- Add `workers` class attribute to DirtyApp (None = all workers)
- Add `parse_dirty_app_spec()` to parse "module:Class:N" format
- Add `DirtyNoWorkersAvailableError` for app-specific error handling
- Update DirtyArbiter with per-app worker tracking and routing
- Maintain backward compatibility when no dirty_apps configured

Example: 8 workers x 10GB model = 80GB RAM needed
With workers=2: 2 x 10GB = 20GB RAM (75% savings)

Configuration formats:
- Class attribute: `workers = 2` on DirtyApp subclass
- Config format: `module:class:N` (e.g., `myapp.ml:HugeModel:2`)
2026-02-01 02:40:09 +01:00
Benoit Chesneau
315e7bde80 fix(http2): ALPN negotiation for gevent/eventlet workers
- Add explicit do_handshake() in base_async.py before ALPN check
  when do_handshake_on_connect is False
- Mark eventlet worker as deprecated (removal in 26.0)
- Add HTTP/2 gevent example with Docker and tests
- Update documentation to reflect eventlet deprecation
- Remove eventlet websocket example (gevent version exists)

The ALPN fix ensures HTTP/2 works correctly with gevent and eventlet
workers when do_handshake_on_connect config is False (the default).
Without explicit handshake, selected_alpn_protocol() returns None.
2026-01-28 13:42:48 +01:00
Benoit Chesneau
4e3245a0df fix(http2): achieve 100% h2spec compliance (146/146 tests)
- Send GOAWAY with correct error codes for protocol violations
- Handle StreamClosedError and FlowControlError gracefully
- Return False instead of raising for missing/closed streams
- Handle flow control window overflow per RFC 7540
- Fix reader race condition and add h2 exception handling
- Wait for WINDOW_UPDATE when flow control window is zero/negative
- Use h2 exception's error_code for INITIAL_WINDOW_SIZE violations
2026-01-27 15:42:42 +01:00
Benoit Chesneau
0f298e4838 feat(http2): add response trailer support 2026-01-27 12:33:12 +01:00
Benoit Chesneau
655716a181 feat(http2): add stream priority support (RFC 7540 Section 5.3) 2026-01-27 11:44:33 +01:00
Benoit Chesneau
251d8ebe51 fix(http2): validate frame size per RFC 7540 (16384-16777215) 2026-01-27 10:51:29 +01:00
Benoit Chesneau
df5d7ad6d2 fix: resolve ruff lint warnings in HTTP/2 code
- Remove unused imports in test files
- Rename loop variable to avoid shadowing sock import
- Remove unused ssock variable in conftest
2026-01-27 10:03:54 +01:00
Benoit Chesneau
17b3786186 Update test to expect :authority override per RFC 9113 2026-01-27 09:59:35 +01:00
Benoit Chesneau
75b46bf6cf Add HTTP 103 Early Hints support (RFC 8297)
Implement HTTP 103 Early Hints as modern replacement for HTTP/2 Server Push.
This allows servers to send resource hints before the final response,
enabling browsers to preload assets in parallel.

WSGI support:
- Add wsgi.early_hints callback to environ dict
- Apps can call environ['wsgi.early_hints'](headers) to send 103 responses
- Silently ignored for HTTP/1.0 clients (don't support 1xx responses)

ASGI support:
- Handle http.response.informational message type
- Apps can await send({"type": "http.response.informational", "status": 103, ...})

HTTP/2 support:
- Add send_informational() method to HTTP2ServerConnection
- Add async send_informational() method to AsyncHTTP2Connection
- Wire up early hints in gthread worker for HTTP/2 requests

Includes unit tests and Docker integration tests for all protocols.
2026-01-27 09:57:32 +01:00
Benoit Chesneau
780e2cf055 Add HTTP/2 tests
Unit tests for HTTP/2 implementation:
- test_http2_stream.py: Stream state management tests
- test_http2_request.py: Request interface tests
- test_http2_connection.py: Connection handling tests
- test_http2_async_connection.py: Async connection tests
- test_http2_config.py: Configuration tests
- test_http2_alpn.py: ALPN negotiation tests
- test_http2_errors.py: Error handling tests
- test_http2_integration.py: Integration tests

Docker integration tests:
- Full HTTP/2 testing environment with nginx proxy
- Direct connection tests and proxy tests
- Concurrent stream tests
- Protocol behavior tests
- Error handling tests
- Header handling tests
- Performance tests
2026-01-27 09:57:32 +01:00
Benoit Chesneau
1fe9e5816e
Merge pull request #3460 from benoitc/feature/dirty-arbiters
feat: add dirty arbiters for long-running blocking operations
2026-01-27 09:45:05 +01:00
Benoit Chesneau
29830ccc2f Increase CI timeout for signal integration tests
PyPy signal handling can be slower in CI environments, so increase
the timeout from 30 to 60 seconds to avoid flaky test failures.
2026-01-25 15:08:05 +01:00
Benoit Chesneau
6575d86251 Add docker integration tests for simple ASGI (HTTP protocol)
Tests the ASGI worker with direct HTTP requests without uWSGI protocol.
Includes tests for GET, POST, query strings, path handling, keepalive,
large bodies, and custom headers.
2026-01-25 15:03:12 +01:00
Benoit Chesneau
8663740907
Add uWSGI protocol support to ASGI worker (#3467)
Add uWSGI protocol support to ASGI worker

- Implements AsyncUWSGIRequest class extending sync UWSGIRequest to reuse parsing logic with async I/O
- ASGI protocol handler selects between HTTP and uWSGI based on --protocol config option
- Allows gunicorn's ASGI worker to receive requests from nginx using uwsgi_pass directive
- Includes unit tests and Docker integration tests
2026-01-25 14:45:07 +01:00
Benoit Chesneau
35559a87bd test: add conftest.py to fix tests module import path 2026-01-25 10:38:07 +01:00
Benoit Chesneau
5e3c07d11d test(dirty): add Docker-based parent death integration tests
Add comprehensive Docker integration tests verifying dirty arbiter
lifecycle under realistic conditions:
- Parent death detection via ppid monitoring
- Orphan cleanup on restart
- Dirty arbiter respawning after crash
- Graceful shutdown with SIGTERM

Also fix race condition in manage_workers() by checking self.alive
before spawning new workers during shutdown.
2026-01-25 10:23:25 +01:00
Benoit Chesneau
79f85af55e fix(dirty): detect parent death and self-terminate
Add ppid monitoring to dirty arbiter's worker monitor loop. If the
main arbiter dies unexpectedly (SIGKILL, crash, OOM), the dirty
arbiter detects the parent change and shuts itself down gracefully.

This complements the existing orphan cleanup on startup.
2026-01-25 10:23:25 +01:00
Benoit Chesneau
b67ff0b31d test: fix warnings and flaky tests in dirty arbiter tests
- Close coroutines in mocked asyncio.run to prevent "never awaited" warning
- Fix flaky integration tests with proper async cleanup and try/finally
- Add uvloop to testing dependencies so uvloop test runs
- Add pytest warning filter for eventlet/asyncio incompatibility
2026-01-25 10:23:25 +01:00
Benoit Chesneau
e21d23bfa6 fix(dirty): add orphan cleanup via well-known PID file
When the main arbiter crashes and restarts, orphaned dirty arbiters
may continue running. This adds detection and cleanup:

- Add well-known PID file location based on proc_name
- Dirty arbiter writes PID on startup, removes on exit
- Main arbiter checks for orphans on fresh start (not USR2)
- Uses self.proc_name for USR2 compatibility (myapp vs myapp.2)

During USR2 upgrade, old and new dirty arbiters coexist with
separate PID files, preventing the old from removing the new's file.
2026-01-25 10:23:25 +01:00
Benoit Chesneau
f6418d4eb0 feat(dirty): add streaming support and async client benchmarks
Add support for streaming responses when dirty app actions return
generators (sync or async). This enables real-time delivery of
incremental results for use cases like LLM token generation.

Features:
- Streaming protocol with chunk/end/error message types
- Worker support for sync and async generators
- Arbiter forwarding of streaming messages
- Deadline-based timeout handling
- Async client streaming API

Protocol:
- Chunk messages (type: "chunk") contain partial data
- End messages (type: "end") signal stream completion
- Error messages can occur mid-stream

New files:
- benchmarks/dirty_streaming.py: Streaming benchmark suite
- tests/dirty/test_*_streaming*.py: Streaming test coverage
- docs/content/dirty.md: Streaming documentation with examples
2026-01-25 10:23:25 +01:00
Benoit Chesneau
62a29bd0e1 test(dirty): add multi-app routing tests
Add tests to verify that when multiple dirty apps are configured,
messages are correctly routed to the appropriate app based on app_path.

New files:
- tests/support_dirty_apps.py: CounterApp and EchoApp test apps
- tests/dirty/test_multi_app_routing.py: 13 routing tests covering
  app loading, routing, state separation, error handling, and
  concurrent requests
2026-01-25 10:23:25 +01:00
Benoit Chesneau
ce2e06ceba refactor(dirty): replace per-worker locks with queues
Replace lock-based request serialization with queue-based approach:
- Each worker now has a dedicated asyncio.Queue and consumer task
- route_request() submits (request, future) to queue and awaits future
- Consumer task processes requests sequentially per worker
- No lock contention - pure async queue operations

Benefits:
- Clearer separation of concerns
- Better visibility into request backlog (queue.qsize())
- Eliminates lock contention under high concurrency

Changes:
- worker_locks dict replaced with worker_queues and worker_consumers
- Added _start_worker_consumer() to create queue and consumer per worker
- Added _execute_on_worker() for actual worker communication
- Updated _cleanup_worker() to cancel consumer tasks
- Updated stop() to cancel all consumers before shutdown

Benchmark results (4 workers, isolated):
- throughput_10ms: 333 req/s, 0 failures
- overload_10ms (200 clients): 334 req/s, 0 failures
- All tests pass with perfect round-robin distribution
2026-01-25 10:23:25 +01:00
Benoit Chesneau
06aba09251 feat(dirty): add thread pool with execution timeout control
- Use dirty_threads config for thread pool size (default: 1)
- Enforce dirty_timeout at worker level via asyncio.wait_for
- Heartbeat runs independently, not blocked by task execution
- Document thread safety and state persistence in docstrings
2026-01-25 10:21:18 +01:00
Benoit Chesneau
21f769ce16 fix: resolve lint issues in dirty arbiter modules 2026-01-25 10:21:18 +01:00
Benoit Chesneau
77222b8017 feat: add dirty arbiters for long-running blocking operations
Introduce Dirty Arbiters - a separate process pool for executing
long-running, blocking operations (AI model loading, heavy computation)
without blocking HTTP workers. Inspired by Erlang's dirty schedulers.

Key features:
- Completely separate from HTTP workers - can be killed/restarted independently
- Stateful - loaded resources persist in dirty worker memory
- Message-passing IPC via Unix sockets with JSON serialization
- Explicit execute() API from HTTP workers
- Asyncio-based for clean concurrent handling

Architecture:
- DirtyArbiter: manages the dirty worker pool, routes requests
- DirtyWorker: executes functions, maintains state, handles requests
- DirtyClient: sync/async API for HTTP workers to call dirty apps
- DirtyProtocol: length-prefixed JSON messages over Unix sockets
- DirtyApp: base class for dirty applications

Configuration options:
- dirty_apps: list of import paths for dirty applications
- dirty_workers: number of dirty workers (default: 0)
- dirty_timeout: task timeout in seconds (default: 300)
- dirty_graceful_timeout: shutdown timeout (default: 30)

Lifecycle hooks:
- on_dirty_starting(arbiter)
- dirty_post_fork(arbiter, worker)
- dirty_worker_init(worker)
- dirty_worker_exit(arbiter, worker)

Includes comprehensive test suite with 164 tests covering:
- Protocol encoding/decoding
- Worker and arbiter lifecycle
- Client sync/async APIs
- Signal handling
- Error handling and timeouts
- Integration tests
2026-01-25 10:21:18 +01:00
Benoit Chesneau
e9a3f30a0f
fix: keep forwarded_allow_ips as strings for backward compatibility (#3459)
The CIDR network support added in 24.1.0 changed forwarded_allow_ips
and proxy_allow_ips from string lists to ipaddress.ip_network objects.
This broke external tools like uvicorn that expect strings.

This fix validates IP/CIDR format during config parsing but keeps the
string representation. Network objects are cached in Config methods
(forwarded_allow_networks() and proxy_allow_networks()) for efficient
IP checking without repeated conversions.

Also uses strict mode for ip_network validation to detect mistakes like
192.168.1.1/24 where host bits are set (should be 192.168.1.0/24).

Fixes #3458
2026-01-23 23:51:25 +01:00
Benoit Chesneau
f3190f84cc
feat: add PROXY protocol v2 support with version selection (#3451)
Extend --proxy-protocol to accept version values (off, v1, v2, auto) instead
of being boolean-only. This allows explicit control over which PROXY protocol
versions are accepted.

Changes:
- Add InvalidProxyHeader exception for v2 binary header errors
- Add validate_proxy_protocol() validator with backwards compatibility
- Update ProxyProtocol setting with nargs="?" and const="auto"
- Add PROXY v2 constants (PP_V2_SIGNATURE, PPCommand, PPFamily, PPProtocol)
- Add _parse_proxy_protocol_v1() and _parse_proxy_protocol_v2() methods
- Update both sync (message.py) and async (asgi/message.py) parsers
- Add hex escape handling in treq.py for v2 binary test data
- Add test cases for v2 TCPv4 and TCPv6

Backwards compatible: --proxy-protocol alone (or True) maps to "auto".

Closes #2912
2026-01-23 18:40:44 +01:00
Benoit Chesneau
f95ac41b8f fix: use smaller buffer in finish_body for faster timeout
Reduce buffer size from 8192 to 1024 bytes when discarding unread
body data, allowing timeouts to trigger more quickly on slow or
stalled connections.
2026-01-23 14:46:40 +01:00
Benoit Chesneau
66963367f3 fix: set socket to blocking mode on keepalive connections
On keepalive connections, finish_request() sets the socket to non-blocking
for selector registration. When the connection is reused, handle() calls
conn.init() which returns early (already initialized) without restoring
blocking mode. This caused SSLWantReadError when WSGI apps read the
request body on SSL connections.

Fix by explicitly setting blocking mode at the start of handle().

Fixes #3448
2026-01-23 14:40:40 +01:00
Benoit Chesneau
e52ac46e29 feat: support CIDR networks in forwarded_allow_ips and proxy_allow_ips
Use Python's ipaddress module to support IP networks in allow lists.
Individual IP addresses are converted to /32 (IPv4) or /128 (IPv6)
networks. CIDR notation (e.g., 192.168.0.0/16) is now supported.

Fixes #1485
Closes #2390
2026-01-23 11:39:05 +01:00
Benoit Chesneau
bbc9bba95e fix: log SIGTERM as info level, not warning
SIGTERM is expected during graceful shutdown and reload operations.
Logging it as warning level causes unnecessary noise in error logs.
SIGKILL remains at error level (suggests OOM), other signals at warning.

Closes #3094
2026-01-23 11:39:05 +01:00
Benoit Chesneau
56abeaf105 fix: unreader.unread() now prepends data to buffer
The unread method was incorrectly appending data to the end of the
buffer instead of prepending it to the beginning. This caused issues
when reading partial data and then unreading it.

Closes #2915
Closes #2346
2026-01-23 11:39:05 +01:00
Benoit Chesneau
0e175a2d34 fix: resolve lint issues and remove obsolete Sphinx references
- Fix lint issues in test_gthread.py:
  - Remove unused imports (queue, partial, http)
  - Move fcntl import to top level
  - Remove unused variable assignment
  - Replace unnecessary lambdas with method references
  - Add blank lines before nested function definitions (E306)

- Update .github/workflows/lint.yml:
  - Replace Sphinx docs check with MkDocs settings generator
  - docs/source directory no longer exists after MkDocs migration

- Update tox.ini:
  - Remove docs/source/*.rst lint (directory doesn't exist)
  - Add tests/test_gthread.py to lint targets
2026-01-23 09:56:32 +01:00
Benoit Chesneau
47b9a18619 fix: handle SSLWantReadError in finish_body() (#3448)
The finish_body() function can raise ssl.SSLWantReadError when
discarding unread request body data on SSL connections. This causes
TLS requests to fail intermittently with "Invalid request" errors.

Handle SSLWantReadError by treating it as "no more data to read".
This is safe because finish_body() only discards leftover data before
keepalive - if SSL says "need to wait for more data", there's nothing
left to discard.

Fixes #3448
2026-01-23 09:38:41 +01:00
Benoit Chesneau
f9df39f600 gevent: Require gevent 24.10.1+ to address CVE-2024-3219 2026-01-23 00:59:51 +01:00