Closes#3482
The dirty module (which uses asyncio and concurrent.futures) was being
imported at gunicorn startup via gunicorn.arbiter. This caused
concurrent.futures to be imported before user code could call
gevent.monkey.patch_all(), breaking gevent's monkey-patching.
Changes:
- gunicorn/arbiter.py: Import DirtyArbiter and set_dirty_socket_path
lazily inside spawn_dirty_arbiter() instead of at module level
- gunicorn/dirty/worker.py: Import ThreadPoolExecutor lazily inside
run() method instead of at module level
- Add tests/workers/test_gevent_import_order.py with 5 tests verifying:
- concurrent.futures is NOT imported when gunicorn.arbiter loads
- gevent patching works correctly with gunicorn
- Reproduces the exact scenario from the bug report gist
This ensures gevent's monkey.patch_all() can run before concurrent.futures
is imported, allowing proper patching of threading primitives.
When the main arbiter crashes and restarts, orphaned dirty arbiters
may continue running. This adds detection and cleanup:
- Add well-known PID file location based on proc_name
- Dirty arbiter writes PID on startup, removes on exit
- Main arbiter checks for orphans on fresh start (not USR2)
- Uses self.proc_name for USR2 compatibility (myapp vs myapp.2)
During USR2 upgrade, old and new dirty arbiters coexist with
separate PID files, preventing the old from removing the new's file.
SIGTERM is expected during graceful shutdown and reload operations.
Logging it as warning level causes unnecessary noise in error logs.
SIGKILL remains at error level (suggests OOM), other signals at warning.
Closes#3094
* tests: Add tests for current signal handling behavior
Add tests for arbiter signal handling:
- TestSignalHandlerRegistration (4 tests): Verify signal handler
registration, pipe creation, SIGCHLD separate handler, and
expected signals list
- TestSignalQueue (4 tests): Test signal queueing, max queue size,
wakeup writes to pipe, and sleep returns on pipe data
- TestReapWorkers (6 tests): Test worker reaping for normal exit,
error exit codes, WORKER_BOOT_ERROR, APP_LOAD_ERROR, signal
termination, and SIGKILL OOM hint
These tests establish baseline coverage before refactoring the
signal handling code for safety and reliability improvements.
* tests: Add tests for SIGHUP reload and worker lifecycle
Add tests for reload and worker management:
- TestSighupReload (3 tests): Verify reload spawns configured number
of workers, calls manage_workers, and logs hang up message
- TestWorkerLifecycle (4 tests): Test spawn_worker adds to WORKERS
dict, kill_worker sends correct signal, murder_workers sends
SIGABRT first then SIGKILL on subsequent timeout
* arbiter: Fix waitpid status parsing using POSIX macros
Use os.WIFEXITED/WEXITSTATUS and os.WIFSIGNALED/WTERMSIG instead
of manual bit shifting for waitpid status interpretation. This
correctly distinguishes between normal exits and signal termination.
The previous code used 'status >> 8' which only worked for normal
exits, and used raw status values for signal detection which was
incorrect.
Fixes part of #3435 and #3056 (signal name display issues)
* arbiter: Change SIGTERM log level to warning
Log signal termination at warning level for expected signals
(SIGTERM, SIGQUIT) since these typically occur during normal
graceful shutdown. SIGKILL remains at error level with the
OOM hint since it indicates abnormal termination.
Fixes#3311, #3050 (SIGTERM logged as error)
* arbiter: Remove logging from SIGCHLD signal handler
Move reap_workers() call from signal handler context to main loop.
The signal handler (now signal_chld) only queues the signal and
wakes up the main loop. The actual reap_workers() is called from
handle_chld() in the main loop where logging is safe.
This fixes potential deadlocks caused by logging from signal
handler context when holding the logging lock.
Fixes#3198, #3004 (logging in signal handlers unsafe, deadlock)
* arbiter: Replace PIPE+select with queue.SimpleQueue
Use queue.SimpleQueue for signal handling instead of PIPE+select.
SimpleQueue is reentrant-safe and can be used from signal handlers.
Changes:
- Remove PIPE-based wakeup mechanism
- Add SIG_QUEUE as SimpleQueue instance
- Add WAKEUP_REQUEST sentinel for non-signal wakeups
- Replace sleep() with wait_for_signals() using queue.get()
- Simplify signal handler to just put_nowait()
- Update main loop to iterate over wait_for_signals()
- Add reap_workers() call in stop() to properly clean up workers
since SIGCHLD is no longer processed during shutdown
This simplifies the code and removes the dependency on select().
Also adds integration tests for signal handling that verify:
- Basic request/response
- Graceful shutdown with SIGTERM/SIGINT
- SIGHUP reload
- Multiple concurrent requests
* arbiter: Wait for old workers on SIGHUP reload
After spawning new workers during reload, wait for old workers to
terminate before returning from reload(). This prevents the issue
where old workers could receive double SIGTERM - once from
manage_workers() and again from the arbiter loop.
The reload now tracks worker_age before spawning, then waits up to
graceful_timeout for workers older than that age to exit.
Fixes#3312, #3274 (SIGHUP can send double SIGTERM)
* arbiter: Log SIGCHLD at debug level
SIGCHLD is received frequently (whenever a worker exits) and doesn't
need to be logged at info level. Log it at debug level to reduce
noise in the logs while still making it available for debugging.
* tests: Fix lint warnings in test_arbiter.py
If you have two (or more) instances of gunicorn that use `reuse-port`
and bind to single unix socket all work until one of gunicorn will
stopped. Because the first stopped removes unix socket file and other
instances can't longer process requests.
Track the use of systemd socket activation and gunicorn socket inheritance
in the arbiter. Unify the logic of creating gunicorn sockets from each of
these sources to always use the socket name to determine the type rather
than checking the configured addresses. The configured addresses are only
used when there is no inheritance from systemd or a parent arbiter.
Fix#1298
This is similar to worker_exit() in that it is called just after a
worker has terminated, but it's called in the Gunicorn *master* process,
not the *child* process.
This changes improve the binary upgrade behaviour using USR2:
- only one binary upgrade can happen at a time: the old arbiter needs to be
killed to promote the new arbiter.
- if a new arbiter is already spawned, until one is killed USR2 has no action
- if a new arbiter has been spawned, the unix socket won't be unlinked
- until the old arbiter have been killed the newly created pidfile has the name
<pidfile>.2 and the name Master.2 .
Note: there is no dialog between both arbiters to handle this features.
Instead they will supervise each others until one is killed. So isolation is
still guaranted.
fix#1267
This change add proper file locking to gunicorn. By default "gunicorn.lock" is created in the temporary directory when a unix socket is bound. In case someone want to fix the lock file path or use multiple gunicorn instance the "--lock-file" setting can be used to set the path of this file.
fix#1259
Close all the listeners when the arbiter shuts down. By doing so,
workers can close the socket at the beginning of a graceful shut
down thereby informing the operating system that the socket can
be cleaned up. With this change, graceful exits with such workers
will refuse new connections while draining, allowing load balancers
to respond more quickly and avoiding leaving connections dangling
in the listen backlog, unaccepted.
Ref #922