New parser rule: refuse HTTP requests where a header field value
contains characters that
a) should never appear there in the first place,
b) might have lead to incorrect treatment in a proxy in front, and
c) might lead to unintended behaviour in applications.
From RFC 9110 section 5.5:
"Field values containing CR, LF, or NUL characters are invalid and
dangerous, due to the varying ways that implementations might parse
and interpret those characters; a recipient of CR, LF, or NUL within
a field value MUST either reject the message or replace each of those
characters with SP before further processing or forwarding of that
message."
since 3.3: EnvironmentError, IOError, socket.error and select.error are merged into IOError.
They may now return a more specific subclass - which this commit does not utilize yet.
Precise number does not matter that much, so lets not stop potentially working tests.
The point was to cut off well before 6 hours, so any small number will do.
chunk extensions are silently ignored before and after this change;
its just the whitespace handling for the case without extensions that matters
applying same strip(WS)->rstrip(BWS) replacement as already done in related cases
half-way fix: could probably reject all BWS cases, rejecting only misplaced ones
Note: This is unrelated to a reverse proxy potentially talking HTTP/3 to clients.
This is about the HTTP protocol version spoken to Gunicorn, which is HTTP/1.0 or HTTP/1.1.
Little legitimate need for processing HTTP 1 requests with ambiguous version numbers.
Broadly refuse.
Co-authored-by: Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu>
Do the validation on the original, not the result from unicode case folding.
Background:
latin-1 0xDF is traditionally uppercased 0x53+0x53 which puts it back in ASCII
In common configuration unlikely a big security problem in itself
you are just fooling the remote about https.
However, it is offers an oracle for otherwise invisible proxy request headers,
so it might help exploiting other vulnerabilities.
If we promise wsgi.input_terminated, we better get it right - or not at all.
* chunked encoding on HTTP <= 1.1
* chunked not last transfer coding
* multiple chinked codings
* any unknown codings (yes, this too! because we do not detect unusual syntax that is still chunked)
* empty coding (plausibly harmless, but not see in real life anyway - refused, for the moment)
Ambiguous mappings open a bottomless pit of "what is user input and what is proxy input" confusion.
Default to what everyone else has been doing for years now, silently drop.
see also https://nginx.org/r/underscores_in_headers