_closed now means the client transport has gone away. Body-wait timeouts
flip a separate _body_wait_expired flag. Both still surface as
http.disconnect to the app, but downstream code can now distinguish 'the
socket is dead' from 'the body never finished framing in time' without
guessing which path set the flag.
A framework bug — say, returning bytes from a HEAD or 204 handler — is
now logged at WARNING level the first time it happens for a request
so the misbehavior is visible without spamming on multi-chunk streams.
RFC 9110 forbids a body for HEAD requests and for 1xx/204/304 status
codes. PR #3614 stopped gunicorn from auto-applying chunked encoding
in those cases, but if the application explicitly emitted a
Content-Length or Transfer-Encoding header (and possibly body bytes),
gunicorn still passed them through. Now strip both headers, force
plain framing, and discard any body the app emits.
The previous assertion ran immediately after a 2s sleep and raced
the arbiter's socket re-creation on slow runners (observed flake on
FreeBSD 14.2 / Python 3.13). Replace with the wait_for_socket helper
already used elsewhere in the file.
- ASGI keepalive gate now keys on receiver._complete only. _closed is
overloaded across transport disconnect and receive timeout; treating
either as 'message complete' would re-enable the smuggling vector
the previous PR was meant to close.
- Parser.finish_body's 64 KiB byte cap now applies only when an explicit
deadline is given. Default invocations (notably __next__, used by
base_async / sync workers) regain the prior unbounded drain so a
partial drain does not silently desync the next request.
- WSGI fast parser now applies the same per-header policy as the Python
parser (Expect, secure_scheme_headers, forwarded_allow_ips trust gate,
forwarder_headers / header_map). Shared helpers extracted on Message.
- ASGI keepalive no longer resets the parser when the previous request
body was not fully framed; the connection closes instead, preventing
request smuggling on pipelined connections.
- BodyReceiver._wait_for_data timeout flips _closed and yields
http.disconnect rather than synthesizing more_body=False. Timeout
honors cfg.timeout.
- ASGI chunked encoding now skips HEAD, 204, and 304 (matches
Response.is_chunked in the WSGI path) via a small helper.
- _setup_callback_parser passes proxy_protocol to PythonProtocol; auto
falls back to the Python parser when proxy_protocol != off (the C
parser does not implement PROXY framing). _effective_peername swaps
the transport peer with the PROXY-supplied client address.
- Parser.finish_body accepts a deadline and a 64KiB byte cap; gthread
passes a deadline and abandons keepalive on incomplete drain so a
stalled client cannot tie up a worker thread.
gunicorn_h1c 0.6.5 ships the Content-Length list-form rejection
(h1c #8). The last python_only marker can now come off
rfc9112_smuggle_cl_list_form_01.
gunicorn_h1c 0.6.4 ships the RFC 9110/9112 hardening added in h1c #4,
#6, and #7: control chars in header values, request-target form/method
pairing, and forbidden trailer field-names. All the corresponding
fixtures now pass against the C parser, so their python_only markers
are removed.
The CL list form fixture stays marked — the C parser does not yet
reject Content-Length: "5, 5".
field-vchar = VCHAR / obs-text; only SP and HTAB are permitted beyond
that. Previous validation only caught NUL/CR/LF, leaving BEL, DEL, FF,
and other C0/C1 controls accepted — a log/response injection risk. Now
rejected across the WSGI and ASGI Python parsers.
Host, Content-Length, Transfer-Encoding, Trailer, Authorization, and TE
are not allowed in trailer sections; accepting them enables smuggling
and routing confusion. Both WSGI and ASGI Python parsers now raise
InvalidHeaderName when any of these appears in a trailer.
Detect authority-form as a request-target that is neither origin-form
(starts with "/"), absolute-form (contains "://"), nor asterisk; reject
it for any method other than CONNECT. Both WSGI and ASGI Python parsers.
The Python WSGI and ASGI parsers both accepted `GET *` and similar; RFC
9112 restricts asterisk-form to OPTIONS. Both now raise InvalidRequestLine.
The fast (C) parser in gunicorn_h1c does not yet enforce this, so the
fixture is marked python_only via a new sidecar flag honored by the WSGI
and ASGI invalid-request harnesses.
Six treq fixtures covering gaps: absolute-form, asterisk-form (OPTIONS *),
authority-form (CONNECT), TE codings stacking (gzip/identity before chunked),
and the CL + TE:chunked smuggling vector.
Phase 1 of a staged corpus expansion; fixtures only, no parser changes.
Avoids TCP RST truncating the response tail when unread request data
(body, pipelined bytes, trailers) sits in the kernel recv buffer at
close time. Half-closes write, linger-reads (bounded 2s / 64 KB),
then closes.
Per @pajod review: the invalid header value may carry sensitive
content, and raising it through the exception could leak it
across security boundaries (browsers/proxies handling response
splitting errors). Pass just the name instead.
The early_hints callback constructs 103 Early Hints responses without
any header validation, while process_headers validates against TOKEN_RE
and HEADER_VALUE_RE for normal responses. This inconsistency means a
WSGI app passing unsanitized data to wsgi.early_hints could enable
HTTP response splitting via CRLF injection.
Apply the same TOKEN_RE/HEADER_VALUE_RE checks from process_headers to
the early_hints callback for defense-in-depth consistency.
Closes#3585