We've had really terrible tail latencies with gevent and gunicorn under load.
Inspecting our services with strace we see the following:
```
23:11:01.651529 accept4(5, {sa_family=AF_UNIX}, [110->2], SOCK_CLOEXEC) = 223 <0.000015>
..{18 successful calls to accept4}...
23:11:01.652590 accept4(5, {sa_family=AF_UNIX}, [110->2], SOCK_CLOEXEC) = 249 <0.000010>
23:11:01.652647 accept4(5, 0x7ffcd46c09d0, [110], SOCK_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) <0.000012>
23:11:01.657622 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000009>
23:11:01.657682 recvfrom(223, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000011>
..{16 calls to recvfrom}...
23:11:01.740726 recvfrom(243, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000012>
23:11:01.746074 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000013>
23:11:01.746153 recvfrom(246, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000014>
23:11:01.751540 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000010>
23:11:01.751599 recvfrom(249, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000013>
```
Notice we see a flury of 20 `accept4`s followed by 20 calls to to `recvfrom`. Each call to `recvfrom` happens 5ms after the previous,
so the last `recvfrom` is called ~100ms after the call to `accept4` for that fd.
gevent suggest setting `max_accept` to a lower value when there's multiple working processes on the same listening socket: 785b7b5546/src/gevent/baseserver.py (L89-L102)
gevent sets `max_accept` to `1` when `wsgi.multiprocess` is True: 9d27d269ed/src/gevent/pywsgi.py (L1470-L1472)
gunicorn does in fact set this when the number of workers is > 1: e4e20f273e/gunicorn/http/wsgi.py (L73)
and this gets passed to `gevent.pywsgi.WSGIServer`: e4e20f273e/gunicorn/workers/ggevent.py (L67-L75)
However, when `worker-class` is `gevent` we directly create a `gevent.server.StreamServer`: e4e20f273e/gunicorn/workers/ggevent.py (L77-L78)
Fixing this dropped the p50 response time on an especially probelmatic benchmark from 250ms to 115ms.
Fixes this exception that is raised when the inotify-based reloader is used in
combination with the `--chdir` option:
inotify.calls.InotifyError: Call failed (should not be -1): (-1) ERRNO=(0)
This replaces the very old sitemap generator which was over 2kloc and
only compatible with Python 2.
According to the stored lastmod, the generator wasn't used since 2010.
The minimal replacement script scan the static site for html files and
uses git to deduce the last modification date of each page.
The sitemap xmlns version was updated to the latest 0.9 from
sitemaps.org .
The index page was given a higher priority since the other pages
are just redirects to the index with anchors.
The output file is pretty printed to help with diffs.
Static assets (css, images...) aren't listed in the sitemap anymore.
WSGI spec requires the SERVER_SOFTWARE property containing the name and version. This change fix it and separate the version header from SERVER_SOFTWARE property. We expose the SERVER variable so custom installations can change it in one place without looking much when needed.
while we still want to know which server is running to ease operation, the version was giving too much information on the installation, so let's remove it.
Fixes#2223.
Unfortunately, eventlet doesn't implement GreenSocket.sendfile, so we have to do it for it.
Add gevent and eventlet to tox.ini and add tests to make sure we can at least import the workers. Some tests that this actually functions would be nice...
Update the gevent and eventlet setup extras to require the versions that are enforced in their worker modules.
Otherwise adding a watcher for a file located in the working directory generates an empty dirname, resulting in the following error:
inotify.calls.InotifyError: Call failed (should not be -1): (-1) ERRNO=(0)
Caused by the fact that we call inotify with an empty path
socket.fromfd using socket.AF_UNIX as type should be enough to be
cross-platform since the address is larger than for others family.
This should allow the code to work cross-platform.