gunicorn/tests/test_signal_integration.py
Benoit Chesneau b650332c70
Arbiter signal handling improvements (#3441)
* tests: Add tests for current signal handling behavior

Add tests for arbiter signal handling:
- TestSignalHandlerRegistration (4 tests): Verify signal handler
  registration, pipe creation, SIGCHLD separate handler, and
  expected signals list
- TestSignalQueue (4 tests): Test signal queueing, max queue size,
  wakeup writes to pipe, and sleep returns on pipe data
- TestReapWorkers (6 tests): Test worker reaping for normal exit,
  error exit codes, WORKER_BOOT_ERROR, APP_LOAD_ERROR, signal
  termination, and SIGKILL OOM hint

These tests establish baseline coverage before refactoring the
signal handling code for safety and reliability improvements.

* tests: Add tests for SIGHUP reload and worker lifecycle

Add tests for reload and worker management:
- TestSighupReload (3 tests): Verify reload spawns configured number
  of workers, calls manage_workers, and logs hang up message
- TestWorkerLifecycle (4 tests): Test spawn_worker adds to WORKERS
  dict, kill_worker sends correct signal, murder_workers sends
  SIGABRT first then SIGKILL on subsequent timeout

* arbiter: Fix waitpid status parsing using POSIX macros

Use os.WIFEXITED/WEXITSTATUS and os.WIFSIGNALED/WTERMSIG instead
of manual bit shifting for waitpid status interpretation. This
correctly distinguishes between normal exits and signal termination.

The previous code used 'status >> 8' which only worked for normal
exits, and used raw status values for signal detection which was
incorrect.

Fixes part of #3435 and #3056 (signal name display issues)

* arbiter: Change SIGTERM log level to warning

Log signal termination at warning level for expected signals
(SIGTERM, SIGQUIT) since these typically occur during normal
graceful shutdown. SIGKILL remains at error level with the
OOM hint since it indicates abnormal termination.

Fixes #3311, #3050 (SIGTERM logged as error)

* arbiter: Remove logging from SIGCHLD signal handler

Move reap_workers() call from signal handler context to main loop.
The signal handler (now signal_chld) only queues the signal and
wakes up the main loop. The actual reap_workers() is called from
handle_chld() in the main loop where logging is safe.

This fixes potential deadlocks caused by logging from signal
handler context when holding the logging lock.

Fixes #3198, #3004 (logging in signal handlers unsafe, deadlock)

* arbiter: Replace PIPE+select with queue.SimpleQueue

Use queue.SimpleQueue for signal handling instead of PIPE+select.
SimpleQueue is reentrant-safe and can be used from signal handlers.

Changes:
- Remove PIPE-based wakeup mechanism
- Add SIG_QUEUE as SimpleQueue instance
- Add WAKEUP_REQUEST sentinel for non-signal wakeups
- Replace sleep() with wait_for_signals() using queue.get()
- Simplify signal handler to just put_nowait()
- Update main loop to iterate over wait_for_signals()
- Add reap_workers() call in stop() to properly clean up workers
  since SIGCHLD is no longer processed during shutdown

This simplifies the code and removes the dependency on select().

Also adds integration tests for signal handling that verify:
- Basic request/response
- Graceful shutdown with SIGTERM/SIGINT
- SIGHUP reload
- Multiple concurrent requests

* arbiter: Wait for old workers on SIGHUP reload

After spawning new workers during reload, wait for old workers to
terminate before returning from reload(). This prevents the issue
where old workers could receive double SIGTERM - once from
manage_workers() and again from the arbiter loop.

The reload now tracks worker_age before spawning, then waits up to
graceful_timeout for workers older than that age to exit.

Fixes #3312, #3274 (SIGHUP can send double SIGTERM)

* arbiter: Log SIGCHLD at debug level

SIGCHLD is received frequently (whenever a worker exits) and doesn't
need to be logged at info level. Log it at debug level to reduce
noise in the logs while still making it available for debugging.

* tests: Fix lint warnings in test_arbiter.py
2026-01-22 11:56:23 +01:00

204 lines
5.7 KiB
Python

#
# This file is part of gunicorn released under the MIT license.
# See the NOTICE for more information.
"""
Integration tests for arbiter signal handling.
These tests start a real gunicorn process and verify signal handling
works correctly with actual requests and signals.
"""
import os
import signal
import socket
import subprocess
import sys
import time
import pytest
# Simple WSGI app inline
SIMPLE_APP = '''
def application(environ, start_response):
"""Basic hello world response."""
status = '200 OK'
body = b'Hello, World!'
headers = [
('Content-Type', 'text/plain'),
('Content-Length', str(len(body))),
]
start_response(status, headers)
return [body]
'''
def find_free_port():
"""Find a free port to bind to."""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(('127.0.0.1', 0))
return s.getsockname()[1]
def wait_for_server(host, port, timeout=10):
"""Wait until server is accepting connections."""
start = time.monotonic()
while time.monotonic() - start < timeout:
try:
with socket.create_connection((host, port), timeout=1):
return True
except (ConnectionRefusedError, socket.timeout, OSError):
time.sleep(0.1)
return False
def make_request(host, port, path='/'):
"""Make a simple HTTP request and return the response body."""
with socket.create_connection((host, port), timeout=5) as sock:
request = f'GET {path} HTTP/1.1\r\nHost: {host}\r\nConnection: close\r\n\r\n'
sock.sendall(request.encode())
response = b''
while True:
chunk = sock.recv(4096)
if not chunk:
break
response += chunk
return response
@pytest.fixture
def app_module(tmp_path):
"""Create a temporary app module."""
app_file = tmp_path / "app.py"
app_file.write_text(SIMPLE_APP)
return str(app_file.parent), "app:application"
@pytest.fixture
def gunicorn_server(app_module):
"""Start and stop a gunicorn server."""
app_dir, app_name = app_module
port = find_free_port()
# Start gunicorn
cmd = [
sys.executable, '-m', 'gunicorn',
'--bind', f'127.0.0.1:{port}',
'--workers', '2',
'--worker-class', 'sync',
'--access-logfile', '-',
'--error-logfile', '-',
'--log-level', 'info',
app_name
]
proc = subprocess.Popen(
cmd,
cwd=app_dir,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
env={**os.environ, 'PYTHONPATH': app_dir}
)
# Wait for server to start
if not wait_for_server('127.0.0.1', port):
proc.terminate()
proc.wait()
stdout, stderr = proc.communicate()
pytest.fail(f"Gunicorn failed to start:\nstdout: {stdout.decode()}\nstderr: {stderr.decode()}")
yield proc, port
# Cleanup
if proc.poll() is None:
proc.terminate()
try:
proc.wait(timeout=5)
except subprocess.TimeoutExpired:
proc.kill()
proc.wait()
class TestSignalHandlingIntegration:
"""Integration tests for signal handling."""
def test_basic_request(self, gunicorn_server):
"""Verify the server responds to basic requests."""
proc, port = gunicorn_server
response = make_request('127.0.0.1', port)
assert b'Hello, World!' in response
def test_graceful_shutdown_sigterm(self, gunicorn_server):
"""Verify SIGTERM causes graceful shutdown."""
proc, port = gunicorn_server
# Verify server is working
response = make_request('127.0.0.1', port)
assert b'Hello, World!' in response
# Send SIGTERM
proc.send_signal(signal.SIGTERM)
# Wait for process to exit
try:
exit_code = proc.wait(timeout=10)
assert exit_code == 0, f"Expected exit code 0, got {exit_code}"
except subprocess.TimeoutExpired:
proc.kill()
pytest.fail("Gunicorn did not exit within timeout after SIGTERM")
def test_graceful_shutdown_sigint(self, gunicorn_server):
"""Verify SIGINT causes graceful shutdown."""
proc, port = gunicorn_server
# Verify server is working
response = make_request('127.0.0.1', port)
assert b'Hello, World!' in response
# Send SIGINT
proc.send_signal(signal.SIGINT)
# Wait for process to exit
try:
exit_code = proc.wait(timeout=10)
assert exit_code == 0, f"Expected exit code 0, got {exit_code}"
except subprocess.TimeoutExpired:
proc.kill()
pytest.fail("Gunicorn did not exit within timeout after SIGINT")
def test_sighup_reload(self, gunicorn_server):
"""Verify SIGHUP triggers reload."""
proc, port = gunicorn_server
# Verify server is working
response = make_request('127.0.0.1', port)
assert b'Hello, World!' in response
# Send SIGHUP
proc.send_signal(signal.SIGHUP)
# Wait a moment for reload
time.sleep(2)
# Verify server still works after reload
assert proc.poll() is None, "Server died after SIGHUP"
response = make_request('127.0.0.1', port)
assert b'Hello, World!' in response
def test_multiple_requests_under_load(self, gunicorn_server):
"""Verify server handles multiple concurrent requests."""
proc, port = gunicorn_server
# Make several requests in sequence
for _ in range(10):
response = make_request('127.0.0.1', port)
assert b'Hello, World!' in response
# Verify server is still running
assert proc.poll() is None
if __name__ == '__main__':
pytest.main([__file__, '-v'])