mirror of
https://github.com/frappe/gunicorn.git
synced 2026-07-01 10:11:30 +08:00
Merge pull request #15 from tanmoysrt/companion_manager
feat: Companion Process Manager
This commit is contained in:
commit
f189de0ec3
@ -1,5 +1,5 @@
|
||||
Fork Information
|
||||
--------
|
||||
----------------
|
||||
|
||||
This is a fork of gunicorn with the following changes:
|
||||
|
||||
|
||||
758
docs/design/companion-process-manager.md
Normal file
758
docs/design/companion-process-manager.md
Normal file
@ -0,0 +1,758 @@
|
||||
|
||||
Status: proposal / draft
|
||||
Author: Tanmoy Sarkar
|
||||
Scope: `gunicorn/arbiter.py`, `gunicorn/config.py`, `gunicorn/companion/`
|
||||
|
||||
## 1. Problem
|
||||
|
||||
A Frappe deployment is not only HTTP workers.
|
||||
|
||||
Alongside Gunicorn, we usually run persistent non-HTTP processes:
|
||||
|
||||
- RQ worker pools
|
||||
- scheduler
|
||||
- socket.io / websocket server
|
||||
- custom background daemons
|
||||
|
||||
Today these are usually managed separately through supervisor/systemd.
|
||||
|
||||
That causes:
|
||||
|
||||
- repeated app memory usage
|
||||
- separate lifecycle for web and side processes
|
||||
- reload drift between HTTP workers and background processes
|
||||
- inconsistent shutdown behavior
|
||||
- harder production process control
|
||||
|
||||
With `preload_app=True`, Gunicorn workers already share preloaded app memory using copy-on-write. The goal is to give non-HTTP processes the same lifecycle and memory-sharing benefit without making them HTTP workers.
|
||||
|
||||
## 2. Goal
|
||||
|
||||
Gunicorn manages one extra child process: the **Companion Manager**.
|
||||
|
||||
The Companion Manager manages all configured companion processes.
|
||||
|
||||
```text
|
||||
gunicorn master
|
||||
├── HTTP worker
|
||||
├── HTTP worker
|
||||
└── companion manager
|
||||
├── rq-default
|
||||
├── rq-long
|
||||
├── scheduler
|
||||
└── socketio
|
||||
```
|
||||
|
||||
Core rule:
|
||||
|
||||
```text
|
||||
Gunicorn Arbiter manages one Companion Manager.
|
||||
Companion Manager manages companion processes.
|
||||
Each companion process manages its own internals.
|
||||
```
|
||||
|
||||
## 3. Architecture
|
||||
|
||||
```text
|
||||
gunicorn master
|
||||
preload_app=True
|
||||
│
|
||||
┌─────────────────────┼─────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
HTTP worker HTTP worker companion manager
|
||||
serves HTTP serves HTTP manages companions
|
||||
│
|
||||
┌──────────────────────┼──────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
rq-default scheduler socketio
|
||||
```
|
||||
|
||||
Memory sharing still works:
|
||||
|
||||
```text
|
||||
gunicorn master preloads app
|
||||
└── forks companion manager
|
||||
└── forks rq / scheduler / socketio
|
||||
```
|
||||
|
||||
The manager is forked from the preloaded master. Companion processes are forked from the manager, so they can inherit preloaded application memory.
|
||||
|
||||
## 4. Responsibility Boundary
|
||||
|
||||
### Gunicorn Arbiter
|
||||
|
||||
The Arbiter should:
|
||||
|
||||
- start the Companion Manager
|
||||
- restart it if it crashes
|
||||
- stop it during Gunicorn shutdown
|
||||
- ask it to `reread` config when needed
|
||||
- avoid per-companion process logic
|
||||
|
||||
### Companion Manager
|
||||
|
||||
The manager should:
|
||||
|
||||
- load and validate companion config
|
||||
- spawn/reap companions
|
||||
- stop/start/restart companions
|
||||
- restart unexpected exits after a fixed delay
|
||||
- track state and expose `status`
|
||||
- expose a Unix control socket
|
||||
- redirect stdout/stderr
|
||||
- apply env and cwd
|
||||
- log lifecycle events
|
||||
|
||||
### Companion Process
|
||||
|
||||
A companion runs the actual service, such as RQ, scheduler, socket.io, or a custom daemon.
|
||||
|
||||
The companion process owns its own internals:
|
||||
|
||||
- signal handling
|
||||
- job draining
|
||||
- child workers
|
||||
- sockets
|
||||
- event loops
|
||||
|
||||
## 5. Companion Is Not an HTTP Worker
|
||||
|
||||
A companion must not:
|
||||
|
||||
- serve Gunicorn HTTP traffic
|
||||
- use Gunicorn listener sockets
|
||||
- use Gunicorn worker heartbeat files
|
||||
- trigger HTTP worker boot-error halt behavior
|
||||
- call HTTP worker lifecycle hooks
|
||||
|
||||
If a companion exits with `WORKER_BOOT_ERROR` or `APP_LOAD_ERROR`, the web tier must not halt. The manager treats it as a normal companion exit.
|
||||
|
||||
## 6. Configuration
|
||||
|
||||
Use dict-based config.
|
||||
|
||||
```python
|
||||
preload_app = True
|
||||
|
||||
companion_config_file = "/home/frappe/frappe-bench/companion.conf.py"
|
||||
companion_control_socket = "/run/gunicorn/companion.sock"
|
||||
|
||||
companion_workers = [
|
||||
{
|
||||
"name": "rq-default",
|
||||
"target": "frappe_companions:start_rq_default",
|
||||
"cwd": "/home/frappe/frappe-bench",
|
||||
"env": {"QUEUE": "default"},
|
||||
"stop_signal": "SIGTERM",
|
||||
"stop_timeout": 300,
|
||||
"reload_timeout": 60,
|
||||
"stdout": "/var/log/frappe/rq-default.log",
|
||||
"stderr": "/var/log/frappe/rq-default.error.log",
|
||||
},
|
||||
{
|
||||
"name": "socketio",
|
||||
"target": "frappe_companions:start_socketio",
|
||||
"cwd": "/home/frappe/frappe-bench",
|
||||
"stop_signal": "SIGTERM",
|
||||
"stop_timeout": 60,
|
||||
"reload_timeout": 30,
|
||||
"stdout": "/var/log/frappe/socketio.log",
|
||||
"stderr": "/var/log/frappe/socketio.error.log",
|
||||
},
|
||||
]
|
||||
```
|
||||
|
||||
Global defaults:
|
||||
|
||||
```python
|
||||
companion_stop_signal = "SIGTERM"
|
||||
companion_stop_timeout = 60
|
||||
companion_reload_timeout = 60
|
||||
|
||||
companion_stdout = None
|
||||
companion_stderr = None
|
||||
companion_cwd = None
|
||||
companion_env = {}
|
||||
|
||||
companion_startsecs = 1
|
||||
companion_restart_delay = 5
|
||||
|
||||
# seconds; used when manager timeout is computed dynamically
|
||||
companion_manager_shutdown_buffer = 10
|
||||
companion_manager_stop_timeout = None
|
||||
|
||||
companion_control_socket_mode = 0o600
|
||||
```
|
||||
|
||||
If manager timeouts are unset, compute them dynamically:
|
||||
|
||||
```text
|
||||
manager_stop_timeout = max(companion.stop_timeout) + companion_manager_shutdown_buffer
|
||||
manager_reload_timeout = max(companion.reload_timeout) + companion_manager_shutdown_buffer
|
||||
```
|
||||
|
||||
## 7. Config Fields
|
||||
|
||||
Required:
|
||||
|
||||
| Field | Meaning |
|
||||
| -------- | --------------------------------------- |
|
||||
| `name` | Unique process name |
|
||||
| `target` | Zero-argument callable or import string |
|
||||
|
||||
Optional:
|
||||
|
||||
| Field | Meaning |
|
||||
| ---------------- | -------------------------------------------------------------------------- |
|
||||
| `cwd` | Working directory before target |
|
||||
| `env` | Extra environment variables |
|
||||
| `stop_signal` | Signal used on stop |
|
||||
| `stop_timeout` | Max wait during shutdown |
|
||||
| `reload_timeout` | Max wait during restart/reread |
|
||||
| `stdout` | Stdout log file or inherit |
|
||||
| `stderr` | Stderr log file, `stdout`, or inherit |
|
||||
| `startsecs` | Seconds process must survive before `RUNNING`; makes `STARTING` meaningful |
|
||||
|
||||
Validation must reject unknown keys, duplicate names, invalid signals/timeouts, invalid stdout/stderr values, and targets that are not zero-argument callables/import strings.
|
||||
|
||||
Not supported: groups, disable/fatal state, max restart count, exponential backoff, process groups, per-companion user switching, HTTP/TCP health checks, process-specific RQ/socket.io behavior.
|
||||
|
||||
## 8. Public States
|
||||
|
||||
Status should mimic `supervisorctl status`.
|
||||
|
||||
```text
|
||||
STOPPED
|
||||
STARTING
|
||||
RUNNING
|
||||
BACKOFF
|
||||
STOPPING
|
||||
```
|
||||
|
||||
| State | Meaning |
|
||||
| ---------- | ---------------------------------------------------------------------------- |
|
||||
| `STOPPED` | Manually stopped or not started |
|
||||
| `STARTING` | Forked, but has not survived `startsecs` |
|
||||
| `RUNNING` | Alive and survived `startsecs` |
|
||||
| `BACKOFF` | Exited unexpectedly; will restart after `companion_restart_delay` |
|
||||
| `STOPPING` | Stop is in progress, from first signal through optional `SIGKILL` until exit |
|
||||
|
||||
No public `EXITED`, `UNKNOWN`, or `FATAL`.
|
||||
|
||||
Exit metadata is tracked separately:
|
||||
|
||||
```text
|
||||
last_exit_code
|
||||
last_exit_signal
|
||||
last_exited_at
|
||||
exit_count
|
||||
```
|
||||
|
||||
## 9. State Transitions
|
||||
|
||||
```text
|
||||
STOPPED
|
||||
└─ start
|
||||
→ STARTING
|
||||
|
||||
STARTING
|
||||
├─ survives startsecs
|
||||
│ → RUNNING
|
||||
├─ exits unexpectedly
|
||||
│ → BACKOFF
|
||||
└─ stop / restart / removed-by-reread
|
||||
→ STOPPING
|
||||
|
||||
RUNNING
|
||||
├─ exits unexpectedly
|
||||
│ → BACKOFF
|
||||
└─ stop / restart / removed-by-reread
|
||||
→ STOPPING
|
||||
|
||||
BACKOFF
|
||||
├─ retry timer expires
|
||||
│ → STARTING
|
||||
└─ stop
|
||||
→ STOPPED
|
||||
|
||||
STOPPING
|
||||
├─ process exits
|
||||
│ → STOPPED
|
||||
└─ timeout exceeded
|
||||
→ SIGKILL
|
||||
→ STOPPED
|
||||
```
|
||||
|
||||
When `waitpid` reaps a child, the manager records exit metadata and immediately moves to the next public state.
|
||||
|
||||
Early exit during `STARTING` and unexpected exit after `RUNNING` both use the same fixed restart delay.
|
||||
|
||||
## 10. Restart Behavior
|
||||
|
||||
Configured companions are expected to stay running.
|
||||
|
||||
Unexpected exit:
|
||||
|
||||
```text
|
||||
record exit metadata
|
||||
state = BACKOFF
|
||||
next_retry_at = now + companion_restart_delay
|
||||
restart after companion_restart_delay
|
||||
```
|
||||
|
||||
Default:
|
||||
|
||||
```python
|
||||
companion_restart_delay = 5
|
||||
```
|
||||
|
||||
There is no exponential backoff, max restart count, disable state, or fatal state.
|
||||
|
||||
A configured process restarts forever unless:
|
||||
|
||||
- manually stopped
|
||||
- removed from config by `reread`
|
||||
- Gunicorn is stopping/reloading
|
||||
|
||||
## 11. Control Socket
|
||||
|
||||
The manager exposes a Unix domain socket:
|
||||
|
||||
```python
|
||||
companion_control_socket = "/run/gunicorn/companion.sock"
|
||||
```
|
||||
|
||||
Default permissions:
|
||||
|
||||
```python
|
||||
companion_control_socket_mode = 0o600
|
||||
```
|
||||
|
||||
Gunicorn runs as a non-root user, so the socket is owned by that user and no
|
||||
group ownership switching is supported.
|
||||
|
||||
Protocol: newline-delimited JSON.
|
||||
|
||||
Commands:
|
||||
|
||||
```text
|
||||
status
|
||||
reread
|
||||
start <name>
|
||||
stop <name>
|
||||
restart <name>
|
||||
```
|
||||
|
||||
The manager creates the socket before entering the main loop. During full manager replacement, clients should retry on `ENOENT`, `ECONNREFUSED`, or timeout.
|
||||
|
||||
## 12. Command Semantics
|
||||
|
||||
### `status`
|
||||
|
||||
Request:
|
||||
|
||||
```json
|
||||
{"cmd": "status"}
|
||||
```
|
||||
|
||||
Human output should mimic `supervisorctl status`:
|
||||
|
||||
```text
|
||||
rq-default RUNNING pid 1234, uptime 2 days, 03:12:44
|
||||
rq-long BACKOFF exited with status 1, retrying in 3s
|
||||
scheduler STOPPED stopped manually
|
||||
```
|
||||
|
||||
JSON response:
|
||||
|
||||
```json
|
||||
{
|
||||
"ok": true,
|
||||
"companions": [
|
||||
{
|
||||
"name": "rq-default",
|
||||
"state": "RUNNING",
|
||||
"pid": 1234,
|
||||
"description": "pid 1234, uptime 2 days, 03:12:44"
|
||||
},
|
||||
{
|
||||
"name": "rq-long",
|
||||
"state": "BACKOFF",
|
||||
"pid": null,
|
||||
"description": "exited with status 1, retrying in 3s",
|
||||
"next_retry_at": 1730000000,
|
||||
"restart_delay": 5,
|
||||
"last_exit_code": 1
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### `start <name>`
|
||||
|
||||
```json
|
||||
{"cmd": "start", "name": "rq-default"}
|
||||
```
|
||||
|
||||
Uses latest validated config.
|
||||
|
||||
```text
|
||||
STOPPED -> clear manual_stop, start now
|
||||
BACKOFF -> cancel pending retry, clear manual_stop, start now
|
||||
RUNNING -> success: already running
|
||||
STARTING -> success: already starting
|
||||
STOPPING -> error: process is stopping; poll status and retry
|
||||
```
|
||||
|
||||
### `stop <name>`
|
||||
|
||||
```json
|
||||
{"cmd": "stop", "name": "rq-default"}
|
||||
```
|
||||
|
||||
```text
|
||||
RUNNING -> send stop_signal, wait stop_timeout, SIGKILL if needed, STOPPED
|
||||
STARTING -> send stop_signal, wait stop_timeout, SIGKILL if needed, STOPPED
|
||||
BACKOFF -> cancel pending retry, STOPPED
|
||||
STOPPED -> success: already stopped
|
||||
STOPPING -> success: already stopping
|
||||
```
|
||||
|
||||
`stop` sets `manual_stop = True`.
|
||||
|
||||
If stopping while `STARTING`, `stop_timeout` governs the stop window, not `startsecs`.
|
||||
|
||||
### `restart <name>`
|
||||
|
||||
```json
|
||||
{"cmd": "restart", "name": "rq-default"}
|
||||
```
|
||||
|
||||
```text
|
||||
RUNNING -> clear manual_stop, stop using reload_timeout, start
|
||||
STARTING -> enter STOPPING, stop current child using reload_timeout, start
|
||||
BACKOFF -> cancel pending retry, clear manual_stop, start immediately
|
||||
STOPPED -> clear manual_stop, start immediately
|
||||
STOPPING -> error: process is stopping; poll status and retry
|
||||
```
|
||||
|
||||
`restart` does not reread config.
|
||||
|
||||
### `reread`
|
||||
|
||||
```json
|
||||
{"cmd": "reread"}
|
||||
```
|
||||
|
||||
Transactional config reload:
|
||||
|
||||
```text
|
||||
new process -> add and start
|
||||
removed process -> stop and remove
|
||||
changed process -> update config; restart unless manual_stop=True
|
||||
unchanged process -> keep current state
|
||||
```
|
||||
|
||||
If a manually stopped process changes config:
|
||||
|
||||
```text
|
||||
update stored config
|
||||
keep STOPPED
|
||||
next start uses latest config
|
||||
```
|
||||
|
||||
Success:
|
||||
|
||||
```json
|
||||
{
|
||||
"ok": true,
|
||||
"added": ["new-worker"],
|
||||
"removed": ["old-worker"],
|
||||
"restarted": ["rq-default"],
|
||||
"unchanged": ["socketio"]
|
||||
}
|
||||
```
|
||||
|
||||
`unchanged` means no process action was taken. It may include manually stopped companions whose config changed; the new config is accepted and stored, and the next `start <name>` uses it.
|
||||
|
||||
Failure:
|
||||
|
||||
```json
|
||||
{
|
||||
"ok": false,
|
||||
"error": "invalid config: duplicate companion name rq-default",
|
||||
"kept_old_config": true
|
||||
}
|
||||
```
|
||||
|
||||
`kept_old_config=true` means no running process was changed and previous validated config remains active.
|
||||
|
||||
## 13. Reread Diff
|
||||
|
||||
Use one stable config hash per companion.
|
||||
|
||||
```text
|
||||
new name -> add/start
|
||||
missing name -> stop/remove
|
||||
hash changed -> update config; restart unless manual_stop=True
|
||||
hash unchanged -> no process action
|
||||
```
|
||||
|
||||
This intentionally restarts even if only `stop_timeout`, `stdout`, or `env` changes. Simpler and easier to test.
|
||||
|
||||
`reread` flow:
|
||||
|
||||
1. Read config file.
|
||||
2. Extract companion settings.
|
||||
3. Validate full config.
|
||||
4. Compute one config hash per companion.
|
||||
5. Diff old/new config.
|
||||
6. Apply only if validation succeeds.
|
||||
|
||||
Prefer a dedicated config file:
|
||||
|
||||
```python
|
||||
companion_config_file = "/home/frappe/frappe-bench/companion.conf.py"
|
||||
```
|
||||
|
||||
If unset, the manager may fall back to Gunicorn config file, but must read only companion settings.
|
||||
|
||||
## 14. stdout/stderr, env, cwd
|
||||
|
||||
### stdout/stderr
|
||||
|
||||
```python
|
||||
"stdout": "/var/log/frappe/rq-default.log",
|
||||
"stderr": "/var/log/frappe/rq-default.error.log",
|
||||
```
|
||||
|
||||
Allowed:
|
||||
|
||||
```python
|
||||
None
|
||||
"inherit"
|
||||
"stdout" # only for stderr
|
||||
"/path/to/file.log"
|
||||
```
|
||||
|
||||
The companion child opens stdout/stderr after fork and before `target()`.
|
||||
|
||||
Files are opened in append mode.
|
||||
|
||||
Log rotation is external:
|
||||
|
||||
- `copytruncate` works without restart
|
||||
- `create`/rename rotation needs companion restart
|
||||
- live fd reopen for already-running companions is out of scope
|
||||
|
||||
### env/cwd
|
||||
|
||||
Before `target()`:
|
||||
|
||||
```python
|
||||
os.chdir(cwd)
|
||||
os.environ.update(env)
|
||||
```
|
||||
|
||||
Changing stdout/stderr/env/cwd changes the config hash and causes restart unless manually stopped.
|
||||
|
||||
## 15. File Descriptors
|
||||
|
||||
Manager child must close Gunicorn-only fds:
|
||||
|
||||
- master signal pipe
|
||||
- HTTP listener sockets
|
||||
- worker heartbeat tmp files
|
||||
|
||||
Companion children must close manager-only fds before running target.
|
||||
|
||||
Companions must not keep Gunicorn HTTP listener sockets open.
|
||||
|
||||
## 16. Parent Death / Orphan Cleanup
|
||||
|
||||
Manager exits if Gunicorn master dies.
|
||||
|
||||
Linux:
|
||||
|
||||
```python
|
||||
prctl(PR_SET_PDEATHSIG, SIGTERM)
|
||||
```
|
||||
|
||||
Non-Linux fallback:
|
||||
|
||||
```text
|
||||
manager records parent pid
|
||||
manager checks os.getppid() every 5 seconds
|
||||
if os.getppid() returns 1, manager exits
|
||||
```
|
||||
|
||||
Companion children should also use parent-death signal where available. Without Linux `prctl`, cleanup after manager death is best-effort because target code takes over.
|
||||
|
||||
## 17. Internal State
|
||||
|
||||
Maintain enough state for `status`:
|
||||
|
||||
- name
|
||||
- state
|
||||
- pid
|
||||
- uptime
|
||||
- restart count
|
||||
- exit count
|
||||
- last exit code/signal
|
||||
- last started/exited time
|
||||
- next retry time
|
||||
- stop timeout kills
|
||||
- manual stop flag
|
||||
- stdout/stderr path
|
||||
|
||||
No Prometheus exporter inside the manager.
|
||||
|
||||
## 18. Implementation Layout
|
||||
|
||||
```text
|
||||
gunicorn/companion/
|
||||
__init__.py
|
||||
config.py
|
||||
process.py
|
||||
manager.py
|
||||
control.py
|
||||
```
|
||||
|
||||
`config.py`:
|
||||
|
||||
- load config
|
||||
- validate config
|
||||
- normalize defaults
|
||||
- compute config hash
|
||||
|
||||
`process.py`:
|
||||
|
||||
- `CompanionConfig`
|
||||
- `CompanionProcess`
|
||||
- state model
|
||||
|
||||
`manager.py`:
|
||||
|
||||
- run loop
|
||||
- spawn/reap
|
||||
- start/stop/restart
|
||||
- fixed restart delay
|
||||
- state transitions
|
||||
- stdout/stderr/env/cwd setup
|
||||
|
||||
`control.py`:
|
||||
|
||||
- Unix socket server
|
||||
- JSON command parser
|
||||
- JSON response writer
|
||||
|
||||
## 19. Arbiter Changes
|
||||
|
||||
Keep Arbiter changes small:
|
||||
|
||||
- manager state
|
||||
- spawn manager
|
||||
- reap manager
|
||||
- stop manager
|
||||
- reload/reread manager
|
||||
- helper to call control socket if needed
|
||||
|
||||
No per-companion logic in Arbiter.
|
||||
|
||||
## 20. Implementation Tasks
|
||||
|
||||
- [x] Add companion config settings in `gunicorn/config.py`.
|
||||
- [x] Add config validation for `companion_workers`.
|
||||
- [x] Add `CompanionConfig` and config hash generation.
|
||||
- [x] Add public process states.
|
||||
- [x] Add `CompanionProcess` runtime state.
|
||||
- [x] Add status description helpers.
|
||||
- [x] Add `CompanionManager` skeleton.
|
||||
- [x] Spawn one companion process from the manager.
|
||||
- [x] Apply `cwd` and `env` before target.
|
||||
- [x] Redirect `stdout` and `stderr`.
|
||||
- [x] Reap exited companion processes.
|
||||
- [x] Implement `STARTING -> RUNNING` using `startsecs`.
|
||||
- [x] Implement `BACKOFF` with fixed `companion_restart_delay`.
|
||||
- [x] Implement `start_process`.
|
||||
- [x] Implement `stop_process`.
|
||||
- [x] Implement `restart_process`.
|
||||
- [x] Preserve and clear `manual_stop` correctly.
|
||||
- [x] Add Unix control socket.
|
||||
- [x] Implement JSON command protocol.
|
||||
- [x] Implement `status`.
|
||||
- [x] Implement `start`.
|
||||
- [x] Implement `stop`.
|
||||
- [x] Implement `restart`.
|
||||
- [x] Implement transactional `reread`.
|
||||
- [x] Add manager spawn/reap logic in Arbiter.
|
||||
- [x] Add manager shutdown handling in Arbiter.
|
||||
- [x] Wire Gunicorn reload to restart the manager only when companion config changed.
|
||||
- [x] Close Gunicorn-only fds in manager child.
|
||||
- [x] Close manager-only fds in companion child.
|
||||
- [x] Add parent-death cleanup.
|
||||
- [x] Add lifecycle logs.
|
||||
- [x] Add tests for config validation.
|
||||
- [x] Add tests for state transitions.
|
||||
- [x] Add tests for control commands.
|
||||
- [x] Add tests for transactional reread.
|
||||
- [x] Add tests that HTTP worker behavior is unchanged.
|
||||
|
||||
## 21. Test Plan
|
||||
|
||||
Test:
|
||||
|
||||
- config validation
|
||||
- config hash diff
|
||||
- transactional reread
|
||||
- `reread` success/failure response
|
||||
- manual stop + reread behavior
|
||||
- `start`, `stop`, `restart` on all public states
|
||||
- control socket commands and permissions
|
||||
- control socket unavailable retry behavior
|
||||
- supervisord-like status output
|
||||
- state transitions
|
||||
- manager lifecycle from Arbiter
|
||||
- companion spawn/reap
|
||||
- fixed 5s restart delay
|
||||
- `startsecs` behavior
|
||||
- stdout/stderr redirection
|
||||
- env and cwd
|
||||
- fd cleanup
|
||||
- parent-death cleanup
|
||||
- HTTP worker behavior unchanged
|
||||
|
||||
## 22. Out of Scope
|
||||
|
||||
Not supported:
|
||||
|
||||
- groups
|
||||
- dependency ordering
|
||||
- process group killing
|
||||
- disable/fatal state
|
||||
- max restart count
|
||||
- exponential backoff
|
||||
- CLI config for companion specs
|
||||
- RQ/socket.io/scheduler-specific behavior
|
||||
- per-companion user switching
|
||||
- HTTP/TCP/custom health checks
|
||||
- live log fd reopen for already-running companions
|
||||
|
||||
## 23. Summary
|
||||
|
||||
Use a Companion Manager, not direct companion management inside Arbiter.
|
||||
|
||||
This gives:
|
||||
|
||||
- shared memory through `preload_app=True`
|
||||
- small Arbiter changes
|
||||
- supervisord-like process management and status
|
||||
- controlled `start`, `stop`, `restart`, `reread`, `status`
|
||||
- transactional config reread
|
||||
- fixed restart delay
|
||||
- simple process-running health
|
||||
- per-companion env/cwd/stdout/stderr
|
||||
- simple public state machine
|
||||
- safer shutdown/reload behavior
|
||||
206
docs/source/companion.rst
Normal file
206
docs/source/companion.rst
Normal file
@ -0,0 +1,206 @@
|
||||
.. _companion:
|
||||
|
||||
===================
|
||||
Companion Processes
|
||||
===================
|
||||
|
||||
Most real deployments run more than HTTP workers. Alongside the web server you
|
||||
often have background processes: task queues (RQ, Celery), a scheduler, a
|
||||
websocket / socket.io server, or custom daemons. Normally these are started and
|
||||
supervised separately with systemd or supervisor.
|
||||
|
||||
The **companion process manager** lets Gunicorn run those processes for you, as
|
||||
children of the same master. They get the same lifecycle as your web workers
|
||||
and, when you use :ref:`preload-app`, they share the preloaded application
|
||||
memory through copy-on-write.
|
||||
|
||||
Why use it
|
||||
==========
|
||||
|
||||
- **One thing to run.** Web workers and background processes start, stop, and
|
||||
reload together under a single Gunicorn command.
|
||||
- **Less memory.** With ``--preload`` the application is loaded once in the
|
||||
master; companions fork from it and share that memory instead of each loading
|
||||
their own copy.
|
||||
- **No drift.** There is one place that owns the lifecycle, so background
|
||||
processes don't get out of step with the web workers.
|
||||
|
||||
If you only run HTTP workers, you don't need this feature and can ignore it.
|
||||
|
||||
How it works
|
||||
============
|
||||
|
||||
Gunicorn forks **one** extra child after preload: the *companion manager*. The
|
||||
manager forks and supervises each companion you configured. The arbiter only
|
||||
watches the single manager; the manager handles everything below it.
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
gunicorn master (preloaded app)
|
||||
├── HTTP worker
|
||||
├── HTTP worker
|
||||
└── companion manager
|
||||
├── rq-default
|
||||
├── scheduler
|
||||
└── socketio
|
||||
|
||||
Each companion is just a Python callable you point Gunicorn at. The manager
|
||||
forks a fresh process, runs the callable, and keeps it alive: if it crashes, the
|
||||
manager restarts it after a short delay.
|
||||
|
||||
Quick start
|
||||
===========
|
||||
|
||||
A companion is configured in your normal Gunicorn config file (the one you pass
|
||||
with ``-c``). Each entry needs a ``name`` and a ``target``. The target is a
|
||||
``"module:callable"`` string; the callable takes no arguments and runs the
|
||||
process (it is expected to block, like a worker's main loop).
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# gunicorn.conf.py
|
||||
preload_app = True # required to share memory with companions
|
||||
|
||||
companion_workers = [
|
||||
{"name": "scheduler", "target": "myapp.tasks:run_scheduler"},
|
||||
{"name": "rq-default", "target": "myapp.tasks:run_rq", "env": {"QUEUE": "default"}},
|
||||
]
|
||||
|
||||
Run Gunicorn as usual::
|
||||
|
||||
gunicorn -c gunicorn.conf.py --preload myapp:application
|
||||
|
||||
You'll see the companion manager and each companion start in the logs.
|
||||
|
||||
Per-companion options
|
||||
---------------------
|
||||
|
||||
Each entry in ``companion_workers`` may set these keys in addition to ``name``
|
||||
and ``target``:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 20 80
|
||||
|
||||
* - Key
|
||||
- Meaning
|
||||
* - ``cwd``
|
||||
- Directory to change into before running the target.
|
||||
* - ``env``
|
||||
- Extra environment variables, merged onto the inherited env.
|
||||
* - ``stop_signal``
|
||||
- Signal sent to ask the companion to stop (default ``SIGTERM``).
|
||||
* - ``stop_timeout``
|
||||
- Seconds to wait after the stop signal before ``SIGKILL``.
|
||||
* - ``reload_timeout``
|
||||
- Seconds to wait for the old process to exit on restart.
|
||||
* - ``startsecs``
|
||||
- Seconds a companion must stay up to count as started.
|
||||
* - ``stdout``
|
||||
- File path for stdout, or ``"inherit"`` (the default).
|
||||
* - ``stderr``
|
||||
- File path, ``"stdout"`` to merge with stdout, or ``"inherit"``.
|
||||
|
||||
Any key you leave out falls back to the matching global setting
|
||||
(``companion_stop_signal``, ``companion_stop_timeout``, and so on), so you can
|
||||
set a default once and override it per companion.
|
||||
|
||||
Keeping companions in a separate file
|
||||
--------------------------------------
|
||||
|
||||
If you want to change companion specs without touching your web config, put the
|
||||
``companion_*`` settings in their own Python file and point Gunicorn at it::
|
||||
|
||||
companion_config_file = "/etc/gunicorn/companions.py"
|
||||
|
||||
The manager reads its companion settings from that file instead of the main
|
||||
config.
|
||||
|
||||
States
|
||||
======
|
||||
|
||||
A companion is always in one of these states (the same vocabulary as
|
||||
``supervisorctl``):
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 20 80
|
||||
|
||||
* - State
|
||||
- Meaning
|
||||
* - ``STARTING``
|
||||
- Just forked, not yet past ``startsecs``.
|
||||
* - ``RUNNING``
|
||||
- Up and healthy.
|
||||
* - ``BACKOFF``
|
||||
- Crashed; waiting ``restart_delay`` seconds before retrying.
|
||||
* - ``STOPPING``
|
||||
- Was asked to stop; draining before exit.
|
||||
* - ``STOPPED``
|
||||
- Stopped on purpose and will not auto-restart.
|
||||
|
||||
A companion that exits on its own goes to ``BACKOFF`` and is restarted. One you
|
||||
stop by hand stays ``STOPPED`` until you start it again.
|
||||
|
||||
Controlling companions at runtime
|
||||
==================================
|
||||
|
||||
Set a control socket and you can inspect and steer companions while Gunicorn
|
||||
runs::
|
||||
|
||||
companion_control_socket = "/run/gunicorn/companion.sock"
|
||||
|
||||
A small CLI, ``gunicorn-companion``, talks to it::
|
||||
|
||||
gunicorn-companion -s /run/gunicorn/companion.sock status
|
||||
gunicorn-companion -s /run/gunicorn/companion.sock restart scheduler
|
||||
gunicorn-companion -s /run/gunicorn/companion.sock stop rq-default
|
||||
gunicorn-companion -s /run/gunicorn/companion.sock start rq-default
|
||||
|
||||
You can also set ``GUNICORN_COMPANION_SOCKET`` instead of passing ``-s`` every
|
||||
time. The protocol is plain newline-delimited JSON, so ``socat`` works too::
|
||||
|
||||
echo '{"cmd": "status"}' | socat - UNIX-CONNECT:/run/gunicorn/companion.sock
|
||||
|
||||
Commands:
|
||||
|
||||
- ``status`` — show every companion's state.
|
||||
- ``start <name>`` / ``stop <name>`` / ``restart <name>`` — act on one.
|
||||
- ``reread`` — re-read the config file and apply only what changed: new
|
||||
companions start, removed ones stop, changed ones restart, untouched ones are
|
||||
left alone. It is transactional — if the new config is invalid, nothing
|
||||
changes and the old one keeps running.
|
||||
|
||||
The socket is created mode ``0o600`` (owner only). Change it with
|
||||
``companion_control_socket_mode`` if you need group access.
|
||||
|
||||
Reload and shutdown
|
||||
===================
|
||||
|
||||
**Reload (SIGHUP).** A reload recycles your HTTP workers and re-reads config.
|
||||
The companion manager is restarted **only if the companion config actually
|
||||
changed** — an ordinary web reload leaves your companions running untouched, so
|
||||
it stays fast. Note that, just like HTTP workers under ``--preload``, companions
|
||||
pick up new *application code* only on a full restart, not on ``SIGHUP``. For
|
||||
fine-grained changes without a full reload, use the ``reread`` command.
|
||||
|
||||
**Shutdown (SIGTERM).** Gunicorn asks the manager to stop, which sends each
|
||||
companion its ``stop_signal`` and waits up to ``stop_timeout`` before forcing it
|
||||
down with ``SIGKILL``. Gunicorn gives the manager enough time to drain all its
|
||||
companions before it gives up; tune that with
|
||||
``companion_manager_stop_timeout`` (or it is derived from the slowest companion
|
||||
plus ``companion_manager_shutdown_buffer``).
|
||||
|
||||
Limitations
|
||||
===========
|
||||
|
||||
- **Hot upgrade (USR2) is not supported with companions.** During a ``USR2``
|
||||
upgrade the old and new masters run side by side, so each runs its own
|
||||
companion manager and every companion runs twice — bad for singletons like a
|
||||
scheduler. Restart the master instead of using ``USR2`` when companions are
|
||||
configured, or keep singletons out of the companion set. A ``SIGHUP`` reload
|
||||
is fine.
|
||||
- **Linux is the primary target.** Orphan cleanup uses ``prctl`` on Linux, with
|
||||
a portable parent-watch fallback elsewhere.
|
||||
|
||||
See the :ref:`settings` page for every ``companion_*`` option.
|
||||
@ -23,6 +23,7 @@ Features
|
||||
* Simple Python configuration
|
||||
* Multiple worker configurations
|
||||
* Various server hooks for extensibility
|
||||
* Supervise non-HTTP :ref:`companion processes <companion>` in the same master
|
||||
* Compatible with Python 3.x >= 3.7
|
||||
|
||||
|
||||
@ -37,6 +38,7 @@ Contents
|
||||
configure
|
||||
settings
|
||||
instrumentation
|
||||
companion
|
||||
deploy
|
||||
signals
|
||||
custom
|
||||
|
||||
@ -20,6 +20,192 @@ for reference on setting at the command line.
|
||||
|
||||
.. versionadded:: 19.7
|
||||
|
||||
Companion Processes
|
||||
-------------------
|
||||
|
||||
.. _companion-config-file:
|
||||
|
||||
``companion_config_file``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Command line:** ``--companion-config``
|
||||
|
||||
**Default:** ``None``
|
||||
|
||||
Dedicated Python file the manager reads for companion specs.
|
||||
|
||||
Lets companions reload independently of the main Gunicorn config.
|
||||
Example: ``companion_config_file = "/home/frappe/bench/companion.conf.py"``.
|
||||
If unset, specs are read from the main Gunicorn config.
|
||||
|
||||
.. _companion-workers:
|
||||
|
||||
``companion_workers``
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``[]``
|
||||
|
||||
List of companion process specs, each a dict.
|
||||
|
||||
A spec requires ``name`` and ``target``; other keys override defaults.
|
||||
Example: ``[{"name": "scheduler", "target": "app:run_scheduler"}]``.
|
||||
|
||||
.. _companion-control-socket:
|
||||
|
||||
``companion_control_socket``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Command line:** ``--companion-control-socket``
|
||||
|
||||
**Default:** ``None``
|
||||
|
||||
Unix socket the manager listens on for control commands.
|
||||
|
||||
Clients send ``status``/``start``/``stop``/``restart``/``reread`` as JSON.
|
||||
Example: ``companion_control_socket = "/run/gunicorn/companion.sock"``.
|
||||
|
||||
.. _companion-control-socket-mode:
|
||||
|
||||
``companion_control_socket_mode``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``384``
|
||||
|
||||
Octal file mode set on the control socket after it is created.
|
||||
|
||||
Default ``0o600`` lets only the owner connect.
|
||||
Example: ``companion_control_socket_mode = 0o660``.
|
||||
|
||||
.. _companion-stop-signal:
|
||||
|
||||
``companion_stop_signal``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``'SIGTERM'``
|
||||
|
||||
Signal sent first when stopping a companion.
|
||||
|
||||
The companion should catch it to drain work before exiting.
|
||||
Example: ``companion_stop_signal = "SIGINT"``.
|
||||
|
||||
.. _companion-stop-timeout:
|
||||
|
||||
``companion_stop_timeout``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``60``
|
||||
|
||||
Seconds to wait after the stop signal before sending SIGKILL.
|
||||
|
||||
Give slow-draining workers a larger value.
|
||||
Example: ``companion_stop_timeout = 300``.
|
||||
|
||||
.. _companion-reload-timeout:
|
||||
|
||||
``companion_reload_timeout``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``60``
|
||||
|
||||
Seconds to wait for the old child to exit during restart/reread.
|
||||
|
||||
Used instead of ``stop_timeout`` when replacing a companion.
|
||||
Example: ``companion_reload_timeout = 30``.
|
||||
|
||||
.. _companion-stdout:
|
||||
|
||||
``companion_stdout``
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``None``
|
||||
|
||||
Where to send companion stdout: a file path or ``"inherit"``.
|
||||
|
||||
Files are opened in append mode; ``None`` inherits the manager's stdout.
|
||||
Example: ``companion_stdout = "/var/log/frappe/rq.log"``.
|
||||
|
||||
.. _companion-stderr:
|
||||
|
||||
``companion_stderr``
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``None``
|
||||
|
||||
Where to send companion stderr: a path, ``"stdout"``, or ``"inherit"``.
|
||||
|
||||
Use ``"stdout"`` to merge stderr into the stdout target.
|
||||
Example: ``companion_stderr = "stdout"``.
|
||||
|
||||
.. _companion-cwd:
|
||||
|
||||
``companion_cwd``
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``None``
|
||||
|
||||
Directory the child changes into before running the target.
|
||||
|
||||
Example: ``companion_cwd = "/home/frappe/frappe-bench"``.
|
||||
|
||||
.. _companion-env:
|
||||
|
||||
``companion_env``
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``{}``
|
||||
|
||||
Extra environment variables merged into the child environment.
|
||||
|
||||
Example: ``companion_env = {"QUEUE": "default"}``.
|
||||
|
||||
.. _companion-startsecs:
|
||||
|
||||
``companion_startsecs``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``1``
|
||||
|
||||
Seconds a freshly forked companion must survive to reach RUNNING.
|
||||
|
||||
Exiting earlier counts as a failed start and triggers BACKOFF.
|
||||
Example: ``companion_startsecs = 5``.
|
||||
|
||||
.. _companion-restart-delay:
|
||||
|
||||
``companion_restart_delay``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``5``
|
||||
|
||||
Fixed seconds to wait before restarting a crashed companion.
|
||||
|
||||
No exponential backoff; the same delay is used every time.
|
||||
Example: ``companion_restart_delay = 10``.
|
||||
|
||||
.. _companion-manager-shutdown-buffer:
|
||||
|
||||
``companion_manager_shutdown_buffer``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``10``
|
||||
|
||||
Extra seconds added to the slowest companion timeout for the manager.
|
||||
|
||||
Gives the manager headroom to stop all children before Gunicorn kills it.
|
||||
Example: ``companion_manager_shutdown_buffer = 15``.
|
||||
|
||||
.. _companion-manager-stop-timeout:
|
||||
|
||||
``companion_manager_stop_timeout``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Default:** ``None``
|
||||
|
||||
Seconds Gunicorn waits for the manager to stop during shutdown.
|
||||
|
||||
By default it uses the slowest companion ``stop_timeout`` plus a buffer.
|
||||
Example: ``companion_manager_stop_timeout = 120``.
|
||||
|
||||
Config File
|
||||
-----------
|
||||
|
||||
@ -1148,7 +1334,7 @@ change the worker process user.
|
||||
Switch worker process to run as this group.
|
||||
|
||||
A valid group id (as an integer) or the name of a user that can be
|
||||
retrieved with a call to ``pwd.getgrnam(value)`` or ``None`` to not
|
||||
retrieved with a call to ``grp.getgrnam(value)`` or ``None`` to not
|
||||
change the worker processes group.
|
||||
|
||||
.. _umask:
|
||||
@ -1703,6 +1889,56 @@ The maximum number of simultaneous clients.
|
||||
|
||||
This setting only affects the ``gthread``, ``eventlet`` and ``gevent`` worker types.
|
||||
|
||||
.. _enable-adaptive-queueing:
|
||||
|
||||
``enable_adaptive_queueing``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Command line:** ``--enable-adaptive-queueing``
|
||||
|
||||
**Default:** ``False``
|
||||
|
||||
Enable adaptive multi-queue routing in the ``gthread`` worker.
|
||||
|
||||
Can also be enabled by setting the ``GUNICORN_ENABLE_ADAPTIVE_QUEUEING``
|
||||
environment variable to ``true``.
|
||||
|
||||
When enabled, the worker splits its :ref:`threads` roughly evenly into
|
||||
two lanes — a *fast* lane and a *slow* lane — and routes each request
|
||||
to one of them by predicting, from previously observed timings of the
|
||||
same route (method + path), whether it will exceed
|
||||
:ref:`slow-request-threshold`. Slow-predicted requests go to the slow
|
||||
lane so they can never starve the fast lane, even under a flood of
|
||||
slow requests.
|
||||
|
||||
Requires :ref:`threads` to be at least 2.
|
||||
|
||||
This setting only affects the ``gthread`` worker type.
|
||||
|
||||
.. versionadded:: 23.1.0
|
||||
|
||||
.. _slow-request-threshold:
|
||||
|
||||
``slow_request_threshold``
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Command line:** ``--slow-request-threshold FLOAT``
|
||||
|
||||
**Default:** ``5.0``
|
||||
|
||||
Processing time (in seconds) above which a request route is treated as
|
||||
"slow" by the ``gthread`` worker when :ref:`enable-adaptive-queueing`
|
||||
is enabled.
|
||||
|
||||
A route is learned as slow once it has been observed exceeding this
|
||||
threshold (either on completion or while still running); its timing
|
||||
decays back below the threshold if it becomes fast again.
|
||||
|
||||
Only used by the ``gthread`` worker when :ref:`enable-adaptive-queueing`
|
||||
is enabled.
|
||||
|
||||
.. versionadded:: 23.1.0
|
||||
|
||||
.. _max-requests:
|
||||
|
||||
``max_requests``
|
||||
|
||||
@ -14,6 +14,9 @@ import socket
|
||||
from gunicorn.errors import HaltServer, AppImportError
|
||||
from gunicorn.pidfile import Pidfile
|
||||
from gunicorn import sock, systemd, util
|
||||
from gunicorn.companion.config import build_companion_configs
|
||||
from gunicorn.companion.control import ControlServer
|
||||
from gunicorn.companion.manager import CompanionManager
|
||||
|
||||
from gunicorn import __version__, SERVER_SOFTWARE
|
||||
|
||||
@ -33,6 +36,10 @@ class Arbiter:
|
||||
# A flag indicating if an application failed to be loaded
|
||||
APP_LOAD_ERROR = 4
|
||||
|
||||
# Cap on the crash-backoff delay before respawning a companion manager
|
||||
# that keeps exiting unexpectedly, so a crash loop cannot busy-spin.
|
||||
COMPANION_MANAGER_MAX_RESPAWN_DELAY = 30
|
||||
|
||||
START_CTX = {}
|
||||
|
||||
LISTENERS = []
|
||||
@ -63,6 +70,18 @@ class Arbiter:
|
||||
self.reexec_pid = 0
|
||||
self.master_pid = 0
|
||||
self.master_name = "Master"
|
||||
self.companion_manager_pid = 0
|
||||
# True while a manager exit is expected (deliberate stop or reload), so
|
||||
# the reaper logs it as info instead of an unexpected-crash error.
|
||||
self._companion_manager_stopping = False
|
||||
# Crash backoff: earliest monotonic time the main loop may respawn a
|
||||
# manager that exited unexpectedly, plus the consecutive-crash count
|
||||
# that sizes the delay.
|
||||
self._companion_manager_respawn_at = 0
|
||||
self._companion_manager_failures = 0
|
||||
# Configs of the currently running companion manager, cached at spawn so
|
||||
# shutdown can size its wait without re-reading the config file.
|
||||
self._companion_configs = []
|
||||
|
||||
cwd = util.getcwd()
|
||||
|
||||
@ -201,6 +220,7 @@ class Arbiter:
|
||||
|
||||
try:
|
||||
self.manage_workers()
|
||||
self.manage_companion_manager()
|
||||
|
||||
while True:
|
||||
self.maybe_promote_master()
|
||||
@ -210,6 +230,7 @@ class Arbiter:
|
||||
self.sleep()
|
||||
self.murder_workers()
|
||||
self.manage_workers()
|
||||
self.manage_companion_manager()
|
||||
continue
|
||||
|
||||
if sig not in self.SIG_NAMES:
|
||||
@ -389,14 +410,18 @@ class Arbiter:
|
||||
sig = signal.SIGTERM
|
||||
if not graceful:
|
||||
sig = signal.SIGQUIT
|
||||
limit = time.time() + self.cfg.graceful_timeout
|
||||
# instruct the workers to exit
|
||||
# The manager may need longer than a worker to drain its companions.
|
||||
limit = time.time() + max(
|
||||
self.cfg.graceful_timeout, self.companion_manager_stop_timeout())
|
||||
# instruct the workers and the companion manager to exit
|
||||
self.kill_workers(sig)
|
||||
self.stop_companion_manager(sig)
|
||||
# wait until the graceful timeout
|
||||
while self.WORKERS and time.time() < limit:
|
||||
while (self.WORKERS or self.companion_manager_pid) and time.time() < limit:
|
||||
time.sleep(0.1)
|
||||
|
||||
self.kill_workers(signal.SIGKILL)
|
||||
self.stop_companion_manager(signal.SIGKILL)
|
||||
|
||||
def reexec(self):
|
||||
"""\
|
||||
@ -487,6 +512,9 @@ class Arbiter:
|
||||
# manage workers
|
||||
self.manage_workers()
|
||||
|
||||
# reload companions with the new configuration
|
||||
self.reload_companion_manager()
|
||||
|
||||
def murder_workers(self):
|
||||
"""\
|
||||
Kill unused/idle workers
|
||||
@ -519,6 +547,27 @@ class Arbiter:
|
||||
break
|
||||
if self.reexec_pid == wpid:
|
||||
self.reexec_pid = 0
|
||||
elif self.companion_manager_pid == wpid:
|
||||
# The manager itself exited; clear its pid so the main
|
||||
# loop respawns it. It owns its companions' lifecycles.
|
||||
self.companion_manager_pid = 0
|
||||
if self._companion_manager_stopping:
|
||||
# Expected exit from a deliberate stop or reload.
|
||||
self._companion_manager_stopping = False
|
||||
self.log.info(
|
||||
"Companion manager (pid:%s) exited", wpid)
|
||||
else:
|
||||
# Unexpected crash: back off before respawning so a
|
||||
# manager that cannot boot does not busy-spin.
|
||||
self._companion_manager_failures += 1
|
||||
delay = min(
|
||||
2 ** (self._companion_manager_failures - 1),
|
||||
self.COMPANION_MANAGER_MAX_RESPAWN_DELAY)
|
||||
self._companion_manager_respawn_at = (
|
||||
time.monotonic() + delay)
|
||||
self.log.error(
|
||||
"Companion manager (pid:%s) exited unexpectedly; "
|
||||
"respawning in %ss", wpid, delay)
|
||||
else:
|
||||
# A worker was terminated. If the termination reason was
|
||||
# that it could not boot, we'll shut it down to avoid
|
||||
@ -645,6 +694,162 @@ class Arbiter:
|
||||
self.spawn_worker()
|
||||
time.sleep(0.1 * random.random())
|
||||
|
||||
def manage_companion_manager(self):
|
||||
"""Keep the companion manager alive, spawning it if it is not running.
|
||||
|
||||
Does nothing unless companions are configured. The manager is a single
|
||||
child of the arbiter; per-companion supervision lives entirely inside
|
||||
it, so the arbiter only ensures the one manager process exists.
|
||||
"""
|
||||
if not (self.companion_manager_pid == 0 and self.cfg.companion_workers):
|
||||
return
|
||||
if time.monotonic() < self._companion_manager_respawn_at:
|
||||
# Still inside the crash-backoff window; wait before respawning.
|
||||
return
|
||||
self.spawn_companion_manager()
|
||||
|
||||
def spawn_companion_manager(self):
|
||||
"""Fork the companion manager process.
|
||||
|
||||
The configs are built in the parent before the fork; the parent then
|
||||
records the manager pid and returns, while the child runs the manager's
|
||||
supervision loop and exits when the loop returns. The manager forks the
|
||||
individual companions itself.
|
||||
"""
|
||||
configs = build_companion_configs(self.cfg)
|
||||
if not configs:
|
||||
return
|
||||
manager = CompanionManager(configs, self.log)
|
||||
manager.config_loader = lambda: build_companion_configs(self.cfg)
|
||||
if self.cfg.companion_control_socket:
|
||||
manager.control = ControlServer(
|
||||
manager.handle_command,
|
||||
self.cfg.companion_control_socket,
|
||||
mode=self.cfg.companion_control_socket_mode or 0o600,
|
||||
log=self.log)
|
||||
|
||||
pid = os.fork()
|
||||
if pid != 0:
|
||||
self.companion_manager_pid = pid
|
||||
self._companion_configs = configs
|
||||
self.log.info("Companion manager started (pid:%s)", pid)
|
||||
return
|
||||
|
||||
# Process Child
|
||||
try:
|
||||
self._close_gunicorn_fds()
|
||||
util._setproctitle("companion manager [%s]" % self.proc_name)
|
||||
manager.run()
|
||||
sys.exit(0)
|
||||
except SystemExit:
|
||||
raise
|
||||
except Exception:
|
||||
self.log.exception("Exception in companion manager process")
|
||||
sys.exit(-1)
|
||||
|
||||
def _close_gunicorn_fds(self):
|
||||
"""Close fds the manager inherited from the arbiter but never uses.
|
||||
|
||||
The companion manager serves no HTTP traffic and does not run the
|
||||
arbiter's signal loop, so it drops the listening sockets, the arbiter's
|
||||
wakeup pipe, and the worker heartbeat files. Closing them keeps the
|
||||
manager (and the companions it forks) from pinning the arbiter's fds.
|
||||
"""
|
||||
for listener in self.LISTENERS:
|
||||
listener.close()
|
||||
for pipe_fd in self.PIPE:
|
||||
try:
|
||||
os.close(pipe_fd)
|
||||
except OSError:
|
||||
pass
|
||||
for worker in self.WORKERS.values():
|
||||
worker.tmp.close()
|
||||
|
||||
def reload_companion_manager(self):
|
||||
"""Reconcile the companion manager with the reloaded configuration.
|
||||
|
||||
A web reload (SIGHUP) recycles HTTP workers and re-reads config, but
|
||||
with ``--preload`` it does not reload application code -- the WSGI
|
||||
callable is loaded once and cached -- so the running companions are
|
||||
already current. Restarting them on every reload would bounce them for
|
||||
nothing, so the manager is only reloaded when the companion config
|
||||
actually changed.
|
||||
|
||||
When it did change, a running manager is asked to stop (it drains its
|
||||
companions first); the SIGCHLD reaper clears its pid so the main loop
|
||||
respawns it from the fresh cfg, and a manager started where none ran
|
||||
comes up right away. An unchanged config leaves the manager and its
|
||||
companions untouched. Per-companion transactional reread stays
|
||||
available separately through the control socket.
|
||||
"""
|
||||
try:
|
||||
new_configs = build_companion_configs(self.cfg)
|
||||
except Exception:
|
||||
self.log.exception(
|
||||
"Could not read companion config on reload; "
|
||||
"leaving companion manager unchanged")
|
||||
return
|
||||
if not self._companion_configs_changed(new_configs):
|
||||
return
|
||||
if self.companion_manager_pid != 0:
|
||||
self.log.info("Companion config changed, reloading companion manager")
|
||||
self.stop_companion_manager(signal.SIGTERM)
|
||||
self.manage_companion_manager()
|
||||
|
||||
def _companion_configs_changed(self, new_configs):
|
||||
"""True when the companion config differs from the running manager's.
|
||||
|
||||
Compares the sorted ``config_hash`` of every companion, so a changed
|
||||
field, an added name, or a removed name all count, while a pure web
|
||||
reload with the same companion specs does not.
|
||||
"""
|
||||
old_hashes = sorted(c.config_hash for c in self._companion_configs)
|
||||
new_hashes = sorted(c.config_hash for c in new_configs)
|
||||
return old_hashes != new_hashes
|
||||
|
||||
def stop_companion_manager(self, sig):
|
||||
"""Signal the companion manager to exit, if it is running.
|
||||
|
||||
A graceful SIGTERM lets the manager stop its own companions before it
|
||||
exits; SIGKILL forces it down. The reaper clears the pid once it dies,
|
||||
so a manager that is already gone is a no-op here.
|
||||
"""
|
||||
if self.companion_manager_pid == 0:
|
||||
return
|
||||
# This exit is on purpose: mark it expected and clear any crash backoff
|
||||
# so the reaper logs info and a reload can respawn without delay.
|
||||
self._companion_manager_stopping = True
|
||||
self._companion_manager_failures = 0
|
||||
self._companion_manager_respawn_at = 0
|
||||
try:
|
||||
os.kill(self.companion_manager_pid, sig)
|
||||
except OSError as e:
|
||||
if e.errno == errno.ESRCH:
|
||||
self.companion_manager_pid = 0
|
||||
return
|
||||
raise
|
||||
|
||||
def companion_manager_stop_timeout(self):
|
||||
"""Seconds to wait for the companion manager during shutdown: the
|
||||
explicit setting, else the slowest companion stop_timeout plus the
|
||||
shutdown buffer, else 0 when no companions are configured.
|
||||
"""
|
||||
if self.cfg.companion_manager_stop_timeout is not None:
|
||||
return self.cfg.companion_manager_stop_timeout
|
||||
# Prefer the configs cached at spawn over re-reading the config file
|
||||
# mid-shutdown, where a since-changed or removed file could raise.
|
||||
configs = self._companion_configs
|
||||
if not configs:
|
||||
try:
|
||||
configs = build_companion_configs(self.cfg)
|
||||
except Exception:
|
||||
self.log.exception("could not read companion config for shutdown")
|
||||
return 0
|
||||
if not configs:
|
||||
return 0
|
||||
slowest = max(config.stop_timeout for config in configs)
|
||||
return slowest + self.cfg.companion_manager_shutdown_buffer
|
||||
|
||||
def kill_workers(self, sig):
|
||||
"""\
|
||||
Kill all workers with the signal `sig`
|
||||
|
||||
155
gunicorn/companion/README.md
Normal file
155
gunicorn/companion/README.md
Normal file
@ -0,0 +1,155 @@
|
||||
# Companion processes
|
||||
|
||||
Gunicorn runs HTTP workers. Many apps also need non-HTTP side processes next to
|
||||
them: RQ workers, a scheduler, socket.io, a custom daemon. This package lets
|
||||
Gunicorn supervise those too, so they share the preloaded application memory
|
||||
(copy-on-write) and one process tree instead of running under a separate
|
||||
supervisor.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
clients (HTTP)
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────┐
|
||||
│ arbiter (master) preloaded app — shared (COW) │
|
||||
└──┬──────────────────┬────────────────────────┬───────────┘
|
||||
│ fork │ fork │ fork (after preload)
|
||||
▼ ▼ ▼
|
||||
┌───────────┐ ┌───────────┐ ┌────────────────────┐
|
||||
│HTTP Worker│ ... │HTTP worker│ │ companion manager │◀─── control
|
||||
└───────────┘ └───────────┘ └─────────┬──────────┘ socket (JSON)
|
||||
│ fork + supervise ▲
|
||||
┌──────────────┼──────────────┐ │
|
||||
▼ ▼ ▼ gunicorn-companion
|
||||
┌─────────┐ ┌─────────┐ ┌─────────┐ / socat
|
||||
│companion│ │companion│ │companion│
|
||||
│ rq │ │scheduler│ │socketio │
|
||||
└─────────┘ └─────────┘ └─────────┘
|
||||
```
|
||||
|
||||
The arbiter forks one **companion manager** after `preload_app`. The manager
|
||||
forks and supervises each configured companion, owns the control socket, and
|
||||
exits when the arbiter does. It is the only companion-aware part of the arbiter;
|
||||
all per-process logic lives in the manager. Companions inherit the preloaded
|
||||
application memory copy-on-write, the same way HTTP workers do.
|
||||
|
||||
## States
|
||||
|
||||
```
|
||||
STOPPED ──start──▶ STARTING ──(survives startsecs)──▶ RUNNING
|
||||
▲ │
|
||||
│ stop / crash
|
||||
│ ▼
|
||||
└────────────── STOPPED / STOPPING ◀── BACKOFF (unexpected exit)
|
||||
```
|
||||
|
||||
- An unexpected exit goes to `BACKOFF` and restarts after a fixed
|
||||
`companion_restart_delay` (no exponential backoff, no retry cap).
|
||||
- A manual `stop` exits to `STOPPED` and stays there.
|
||||
- `stop` sends `companion_stop_signal`, then `SIGKILL` after
|
||||
`companion_stop_timeout`.
|
||||
|
||||
## Configuration
|
||||
|
||||
Companions live in the normal Gunicorn config — a Python file you pass with `-c`. To reload companions independently of the web config, put them in a dedicated file and point `companion_config_file` (or `--companion-config`) at it; the manager then reads its companion settings from there instead.
|
||||
|
||||
Save a `gunicorn.conf.py`:
|
||||
|
||||
```python
|
||||
preload_app = True # required to share memory
|
||||
companion_control_socket = "/run/gunicorn/companion.sock"
|
||||
companion_workers = [
|
||||
{
|
||||
"name": "ticker",
|
||||
"target": "myapp.jobs:run", # callable or "module:attr"
|
||||
"stdout": "/var/log/myapp/ticker.log",
|
||||
"stderr": "stdout", # path, "stdout", or "inherit"
|
||||
},
|
||||
]
|
||||
```
|
||||
|
||||
Start Gunicorn pointing at it; the manager and companions come up with the HTTP
|
||||
workers:
|
||||
|
||||
```sh
|
||||
gunicorn -c gunicorn.conf.py myapp.wsgi:application
|
||||
```
|
||||
|
||||
`companion_workers` is a list of dicts. `name` and `target` are required; every
|
||||
other field falls back to the matching global `companion_*` setting, so a dict
|
||||
only names what differs from the defaults:
|
||||
|
||||
| Setting | Per-companion key | Meaning |
|
||||
|-------------------------------|-------------------|-------------------------------------------|
|
||||
| `companion_control_socket` | — | Unix socket the manager listens on |
|
||||
| `companion_cwd` | `cwd` | working directory before the target runs |
|
||||
| `companion_env` | `env` | extra environment variables (merged) |
|
||||
| `companion_stop_signal` | `stop_signal` | signal sent first on stop (`SIGTERM`) |
|
||||
| `companion_stop_timeout` | `stop_timeout` | seconds before `SIGKILL` |
|
||||
| `companion_startsecs` | `startsecs` | seconds alive to reach `RUNNING` |
|
||||
| `companion_restart_delay` | — | seconds before restarting a crash |
|
||||
| `companion_stdout` | `stdout` | stdout file, or `"inherit"` |
|
||||
| `companion_stderr` | `stderr` | stderr file, `"stdout"`, or `"inherit"` |
|
||||
|
||||
`target` is either an import string `"module:attr"` or a zero-argument callable.
|
||||
The child applies `cwd`/`env`, redirects `stdout`/`stderr`, then calls the
|
||||
target. Log rotation stays external.
|
||||
|
||||
## Control
|
||||
|
||||
The manager listens on the Unix socket at `companion_control_socket` (0o600,
|
||||
owned by the user Gunicorn runs as). The protocol is one JSON object per line.
|
||||
|
||||
Use the CLI:
|
||||
|
||||
```sh
|
||||
export GUNICORN_COMPANION_SOCKET=/run/gunicorn/companion.sock
|
||||
gunicorn-companion status
|
||||
gunicorn-companion stop ticker
|
||||
gunicorn-companion restart ticker
|
||||
gunicorn-companion reread # re-read config; restart only changed companions
|
||||
```
|
||||
|
||||
Or talk to the socket directly:
|
||||
|
||||
```sh
|
||||
echo '{"cmd": "status"}' | socat - UNIX-CONNECT:/run/gunicorn/companion.sock
|
||||
```
|
||||
|
||||
Commands: `status`, `start <name>`, `stop <name>`, `restart <name>`, `reread`.
|
||||
|
||||
`reread` is transactional: the new config is validated first, and on any error
|
||||
nothing changes and the old config keeps running.
|
||||
|
||||
A `SIGHUP` to Gunicorn recycles the HTTP workers, then reloads the companion
|
||||
manager **only if the companion config changed**. With `--preload`, a reload
|
||||
does not re-import application code (the WSGI callable is loaded once and
|
||||
cached), so unchanged companions are already current and are left running --
|
||||
the common web reload never bounces them. When the companion specs do change,
|
||||
the manager is restarted with the new config. Note that, like the HTTP
|
||||
workers, companions pick up new application code only on a full restart, not on
|
||||
`SIGHUP`. Use `reread` for finer, per-companion changes without touching the
|
||||
others.
|
||||
|
||||
## Files
|
||||
|
||||
| File | Responsibility |
|
||||
|--------------|-----------------------------------------------------------|
|
||||
| `config.py` | `CompanionConfig`, config hash, build configs from cfg |
|
||||
| `process.py` | `CompanionProcess` runtime state, public states |
|
||||
| `manager.py` | fork/reap, state transitions, restart delay, run loop |
|
||||
| `control.py` | Unix socket server and JSON framing |
|
||||
| `ctl.py` | `gunicorn-companion` command-line client |
|
||||
|
||||
## Limitations
|
||||
|
||||
Hot reexec (`SIGUSR2` zero-downtime upgrade) is not supported with
|
||||
companions. During the overlap the old and new masters each run their own
|
||||
companion manager, so every companion runs twice. For singletons such as a
|
||||
scheduler this means duplicate work (e.g. double-fired jobs). Stop and
|
||||
restart the master instead of reexec'ing when companions are configured, or
|
||||
keep singleton companions out of the companion set. A graceful reload
|
||||
(`SIGHUP`) is fine: it restarts the single manager in place.
|
||||
|
||||
10
gunicorn/companion/__init__.py
Normal file
10
gunicorn/companion/__init__.py
Normal file
@ -0,0 +1,10 @@
|
||||
#
|
||||
# This file is part of gunicorn released under the MIT license.
|
||||
# See the NOTICE for more information.
|
||||
|
||||
"""Companion process manager.
|
||||
|
||||
Gunicorn manages one extra child, the Companion Manager, which manages all
|
||||
configured non-HTTP companion processes (RQ workers, scheduler, socket.io,
|
||||
custom daemons). See ``docs/design/companion-process-manager.md``.
|
||||
"""
|
||||
156
gunicorn/companion/config.py
Normal file
156
gunicorn/companion/config.py
Normal file
@ -0,0 +1,156 @@
|
||||
#
|
||||
# This file is part of gunicorn released under the MIT license.
|
||||
# See the NOTICE for more information.
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
import signal
|
||||
|
||||
|
||||
# Maps each optional companion field to the global setting build_companion_configs
|
||||
# reads when a spec omits it. ``name`` and ``target`` are required per spec and
|
||||
# have no global default. ``restart_delay`` is global-only, so it is absent here.
|
||||
FIELD_DEFAULTS = {
|
||||
"cwd": "companion_cwd",
|
||||
"env": "companion_env",
|
||||
"stop_signal": "companion_stop_signal",
|
||||
"stop_timeout": "companion_stop_timeout",
|
||||
"reload_timeout": "companion_reload_timeout",
|
||||
"stdout": "companion_stdout",
|
||||
"stderr": "companion_stderr",
|
||||
"startsecs": "companion_startsecs",
|
||||
}
|
||||
|
||||
|
||||
class CompanionConfig:
|
||||
"""Validated, normalized config for a single companion.
|
||||
|
||||
Built from one entry of ``companion_workers`` with global defaults already
|
||||
applied. ``config_hash`` is a stable digest of every field; the manager
|
||||
restarts a companion whenever its hash changes on reread.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
name,
|
||||
target,
|
||||
cwd=None,
|
||||
env=None,
|
||||
stop_signal="SIGTERM",
|
||||
stop_timeout=60,
|
||||
reload_timeout=60,
|
||||
stdout=None,
|
||||
stderr=None,
|
||||
startsecs=1,
|
||||
restart_delay=5,
|
||||
):
|
||||
self.name = name
|
||||
self.target = target
|
||||
self.cwd = cwd
|
||||
self.env = dict(env or {})
|
||||
self.stop_signal = stop_signal
|
||||
self.stop_timeout = stop_timeout
|
||||
self.reload_timeout = reload_timeout
|
||||
self.stdout = stdout
|
||||
self.stderr = stderr
|
||||
self.startsecs = startsecs
|
||||
self.restart_delay = restart_delay
|
||||
|
||||
def to_dict(self):
|
||||
return {
|
||||
"name": self.name,
|
||||
"target": self.target,
|
||||
"cwd": self.cwd,
|
||||
"env": self.env,
|
||||
"stop_signal": self.stop_signal,
|
||||
"stop_timeout": self.stop_timeout,
|
||||
"reload_timeout": self.reload_timeout,
|
||||
"stdout": self.stdout,
|
||||
"stderr": self.stderr,
|
||||
"startsecs": self.startsecs,
|
||||
}
|
||||
|
||||
@property
|
||||
def config_hash(self):
|
||||
# Sort keys so dict ordering never changes the digest. A callable
|
||||
# target has no stable repr across runs, so use its qualified name.
|
||||
data = self.to_dict()
|
||||
data["target"] = self._target_key(self.target)
|
||||
payload = json.dumps(data, sort_keys=True, default=str)
|
||||
return hashlib.sha256(payload.encode("utf-8")).hexdigest()
|
||||
|
||||
@staticmethod
|
||||
def _target_key(target):
|
||||
if callable(target):
|
||||
module = getattr(target, "__module__", "")
|
||||
qualified_name = getattr(target, "__qualname__", repr(target))
|
||||
return "%s:%s" % (module, qualified_name)
|
||||
return str(target)
|
||||
|
||||
def __repr__(self):
|
||||
return "<CompanionConfig %s>" % self.name
|
||||
|
||||
|
||||
ALLOWED_SPEC_KEYS = {"name", "target"} | set(FIELD_DEFAULTS)
|
||||
|
||||
|
||||
def _validate_stop_signal(stop_signal, name):
|
||||
"""Reject a stop_signal that does not name a real signal.
|
||||
|
||||
Caught here at build time so a typo like ``"SIGTRM"`` fails loudly when
|
||||
config is loaded or rereaded, rather than crashing the manager later when
|
||||
it tries to send the signal.
|
||||
"""
|
||||
if isinstance(stop_signal, str):
|
||||
valid = stop_signal in signal.Signals.__members__
|
||||
else:
|
||||
valid = stop_signal in set(signal.Signals)
|
||||
if not valid:
|
||||
raise ValueError(
|
||||
"companion %s has unknown stop_signal %r" % (name, stop_signal))
|
||||
|
||||
|
||||
def _load_companion_settings(cfg):
|
||||
"""Return the ``companion_*`` settings from ``companion_config_file``, or
|
||||
``{}`` when no dedicated file is configured."""
|
||||
path = getattr(cfg, "companion_config_file", None)
|
||||
if not path:
|
||||
return {}
|
||||
namespace = {}
|
||||
with open(path) as config_file:
|
||||
# The companion config file is trusted operator input, like the main
|
||||
# Gunicorn config; running it is the point.
|
||||
exec(compile(config_file.read(), path, "exec"), namespace) # pylint: disable=exec-used
|
||||
return {name: value for name, value in namespace.items()
|
||||
if name.startswith("companion_")}
|
||||
|
||||
|
||||
def build_companion_configs(cfg):
|
||||
"""Build a CompanionConfig list from the companion settings.
|
||||
|
||||
Settings come from ``companion_config_file`` when set, otherwise ``cfg``. A
|
||||
spec is rejected if it is missing ``name``/``target`` or carries an unknown
|
||||
key.
|
||||
"""
|
||||
overrides = _load_companion_settings(cfg)
|
||||
|
||||
def setting(name):
|
||||
return overrides.get(name, getattr(cfg, name))
|
||||
|
||||
configs = []
|
||||
for spec in setting("companion_workers"):
|
||||
if "name" not in spec or "target" not in spec:
|
||||
raise ValueError(
|
||||
"each companion worker needs 'name' and 'target': %s" % spec)
|
||||
unknown = set(spec) - ALLOWED_SPEC_KEYS
|
||||
if unknown:
|
||||
raise ValueError(
|
||||
"unknown companion worker key(s) %s in %s"
|
||||
% (sorted(unknown), spec))
|
||||
fields = {field: spec.get(field, setting(global_setting))
|
||||
for field, global_setting in FIELD_DEFAULTS.items()}
|
||||
_validate_stop_signal(fields["stop_signal"], spec["name"])
|
||||
configs.append(CompanionConfig(
|
||||
name=spec["name"], target=spec["target"],
|
||||
restart_delay=setting("companion_restart_delay"), **fields))
|
||||
return configs
|
||||
136
gunicorn/companion/control.py
Normal file
136
gunicorn/companion/control.py
Normal file
@ -0,0 +1,136 @@
|
||||
#
|
||||
# This file is part of gunicorn released under the MIT license.
|
||||
# See the NOTICE for more information.
|
||||
|
||||
|
||||
import json
|
||||
import os
|
||||
import socket
|
||||
|
||||
|
||||
class CommandError(Exception):
|
||||
"""A control request the manager understood but had to reject.
|
||||
|
||||
Raised for malformed input (bad JSON, missing ``cmd``). It is turned into
|
||||
an ``{"ok": false, "error": ...}`` response rather than crashing the
|
||||
manager, so a buggy or hostile client can never take the socket down.
|
||||
"""
|
||||
|
||||
|
||||
def decode_command(line):
|
||||
"""Parse one request line into a command dict.
|
||||
|
||||
The wire protocol is newline-delimited JSON: each request is a single JSON
|
||||
object on its own line, e.g. ``{"cmd": "status"}``. Every request must be a
|
||||
JSON object carrying a string ``cmd``; anything else is a ``CommandError``.
|
||||
"""
|
||||
try:
|
||||
command = json.loads(line)
|
||||
except (ValueError, TypeError):
|
||||
raise CommandError("invalid JSON")
|
||||
if not isinstance(command, dict):
|
||||
raise CommandError("request must be a JSON object")
|
||||
if not isinstance(command.get("cmd"), str):
|
||||
raise CommandError("missing 'cmd'")
|
||||
return command
|
||||
|
||||
|
||||
def encode_response(response):
|
||||
"""Encode a response dict as one newline-terminated JSON line of bytes."""
|
||||
return (json.dumps(response) + "\n").encode("utf-8")
|
||||
|
||||
|
||||
# A control request is a single short JSON line. Cap the unframed buffer so a
|
||||
# client that never sends a newline cannot grow it without bound.
|
||||
MAX_LINE_BYTES = 1 << 20
|
||||
|
||||
|
||||
class ControlServer:
|
||||
"""The manager's Unix-socket control endpoint.
|
||||
|
||||
Owns the listening socket and the request framing only. Turning a decoded
|
||||
command into an action is delegated to ``dispatch`` (wired to the manager's
|
||||
command handlers in a later task); this class just decodes each line, runs
|
||||
it through ``dispatch``, and writes back the encoded reply.
|
||||
|
||||
The socket is created with mode 0o600 and owned by the (non-root) user
|
||||
gunicorn runs as. There is no group-ownership switching.
|
||||
"""
|
||||
|
||||
def __init__(self, dispatch, path, mode=0o600, log=None, backlog=64):
|
||||
self.dispatch = dispatch
|
||||
self.path = path
|
||||
self.mode = mode
|
||||
self.log = log
|
||||
self.backlog = backlog
|
||||
self.listener = None
|
||||
|
||||
def create(self):
|
||||
"""Bind and listen on the Unix socket, replacing any stale one.
|
||||
|
||||
A leftover socket file from a previous manager would make ``bind``
|
||||
fail, so it is unlinked first. Called once before the manager enters
|
||||
its run loop, as clients expect the socket to exist by then.
|
||||
"""
|
||||
if os.path.exists(self.path):
|
||||
os.unlink(self.path)
|
||||
listener = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
|
||||
listener.bind(self.path)
|
||||
os.chmod(self.path, self.mode)
|
||||
listener.listen(self.backlog)
|
||||
self.listener = listener
|
||||
return listener
|
||||
|
||||
def close(self):
|
||||
"""Close the listening socket and remove its file."""
|
||||
if self.listener is not None:
|
||||
self.listener.close()
|
||||
self.listener = None
|
||||
if os.path.exists(self.path):
|
||||
os.unlink(self.path)
|
||||
|
||||
def handle_line(self, line):
|
||||
"""Run one request line and return the encoded response bytes.
|
||||
|
||||
Decoding and dispatch failures are caught and rendered as an error
|
||||
response, so one bad request never breaks the connection or the
|
||||
manager. CommandError is the expected rejection; any other exception
|
||||
from dispatch is an unexpected handler bug, caught here for the same
|
||||
reason -- it must not escape and kill the manager's run loop.
|
||||
"""
|
||||
try:
|
||||
response = self.dispatch(decode_command(line))
|
||||
except CommandError as error:
|
||||
response = {"ok": False, "error": str(error)}
|
||||
except Exception as error:
|
||||
if self.log is not None:
|
||||
self.log.exception("companion control command failed")
|
||||
response = {"ok": False, "error": "internal error: %s" % error}
|
||||
return encode_response(response)
|
||||
|
||||
def serve_connection(self, connection):
|
||||
"""Serve newline-delimited requests on one accepted connection.
|
||||
|
||||
Reads until the client hangs up, buffering partial reads and answering
|
||||
each complete line as it arrives. A trailing fragment without a newline
|
||||
is ignored. A single line that grows past ``MAX_LINE_BYTES`` without a
|
||||
newline is treated as abuse: the connection is dropped so it cannot pin
|
||||
unbounded memory.
|
||||
"""
|
||||
buffer = b""
|
||||
with connection:
|
||||
while True:
|
||||
chunk = connection.recv(65536)
|
||||
if not chunk:
|
||||
break
|
||||
buffer += chunk
|
||||
while b"\n" in buffer:
|
||||
line, buffer = buffer.split(b"\n", 1)
|
||||
if line.strip():
|
||||
connection.sendall(self.handle_line(line))
|
||||
if len(buffer) > MAX_LINE_BYTES:
|
||||
if self.log is not None:
|
||||
self.log.warning(
|
||||
"companion control line exceeded %d bytes; closing",
|
||||
MAX_LINE_BYTES)
|
||||
break
|
||||
83
gunicorn/companion/ctl.py
Normal file
83
gunicorn/companion/ctl.py
Normal file
@ -0,0 +1,83 @@
|
||||
#
|
||||
# This file is part of gunicorn released under the MIT license.
|
||||
# See the NOTICE for more information.
|
||||
|
||||
"""Command-line client for the companion control socket.
|
||||
|
||||
Speaks the newline-delimited JSON protocol the manager's ControlServer serves:
|
||||
sends one command and prints the manager's reply. Installed as the
|
||||
``gunicorn-companion`` console script.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import socket
|
||||
import sys
|
||||
|
||||
# Commands that act on one named companion and so require a name argument.
|
||||
PER_NAME_COMMANDS = ("start", "stop", "restart")
|
||||
COMMANDS = ("status", "reread") + PER_NAME_COMMANDS
|
||||
|
||||
|
||||
def send_command(socket_path, command):
|
||||
"""Send one command dict to the control socket and return the reply dict."""
|
||||
client = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
|
||||
try:
|
||||
client.connect(socket_path)
|
||||
client.sendall((json.dumps(command) + "\n").encode("utf-8"))
|
||||
chunks = []
|
||||
while True:
|
||||
chunk = client.recv(65536)
|
||||
if not chunk:
|
||||
break
|
||||
chunks.append(chunk)
|
||||
if b"\n" in chunk:
|
||||
break
|
||||
finally:
|
||||
client.close()
|
||||
return json.loads(b"".join(chunks).decode("utf-8"))
|
||||
|
||||
|
||||
def build_parser():
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="gunicorn-companion",
|
||||
description="Control gunicorn companion processes.")
|
||||
parser.add_argument(
|
||||
"-s", "--socket",
|
||||
default=os.environ.get("GUNICORN_COMPANION_SOCKET"),
|
||||
help="path to the companion control socket "
|
||||
"(defaults to $GUNICORN_COMPANION_SOCKET)")
|
||||
parser.add_argument("command", choices=COMMANDS)
|
||||
parser.add_argument(
|
||||
"name", nargs="?",
|
||||
help="companion name (required for start, stop, restart)")
|
||||
return parser
|
||||
|
||||
|
||||
def run(argv=None):
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(argv)
|
||||
if not args.socket:
|
||||
parser.error("no control socket; pass --socket or set "
|
||||
"GUNICORN_COMPANION_SOCKET")
|
||||
if args.command in PER_NAME_COMMANDS and not args.name:
|
||||
parser.error("%s requires a companion name" % args.command)
|
||||
|
||||
command = {"cmd": args.command}
|
||||
if args.name:
|
||||
command["name"] = args.name
|
||||
|
||||
try:
|
||||
response = send_command(args.socket, command)
|
||||
except OSError as error:
|
||||
print("cannot reach companion socket %s: %s" % (args.socket, error),
|
||||
file=sys.stderr)
|
||||
return 2
|
||||
|
||||
print(json.dumps(response, indent=2))
|
||||
return 0 if response.get("ok") else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(run())
|
||||
682
gunicorn/companion/manager.py
Normal file
682
gunicorn/companion/manager.py
Normal file
@ -0,0 +1,682 @@
|
||||
#
|
||||
# This file is part of gunicorn released under the MIT license.
|
||||
# See the NOTICE for more information.
|
||||
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import ctypes
|
||||
import importlib
|
||||
import os
|
||||
import select
|
||||
import signal
|
||||
import sys
|
||||
import time
|
||||
from typing import TYPE_CHECKING, Callable, Iterable, Union
|
||||
|
||||
from gunicorn import util
|
||||
from gunicorn.companion.control import CommandError
|
||||
from gunicorn.companion.process import CompanionProcess, State
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from gunicorn.companion.config import CompanionConfig
|
||||
|
||||
# prctl option number for "send me this signal when my parent dies".
|
||||
PR_SET_PDEATHSIG = 1
|
||||
|
||||
|
||||
# Signals the arbiter and manager install handlers for; a forked companion
|
||||
# resets them to the default so its stop signal works and the target starts
|
||||
# from a clean slate, the same way a gunicorn worker does.
|
||||
INHERITED_SIGNALS = [
|
||||
"SIGCHLD", "SIGTERM", "SIGINT", "SIGHUP", "SIGQUIT",
|
||||
"SIGUSR1", "SIGUSR2", "SIGWINCH", "SIGTTIN", "SIGTTOU",
|
||||
]
|
||||
|
||||
|
||||
def reset_child_signals() -> None:
|
||||
"""Restore default signal handling in a freshly forked companion.
|
||||
|
||||
The companion inherits the manager's (and arbiter's) handlers across the
|
||||
fork. Without this, a stop signal like SIGTERM would hit the manager's
|
||||
handler -- which just flips a flag -- instead of terminating the companion.
|
||||
"""
|
||||
for name in INHERITED_SIGNALS:
|
||||
number = getattr(signal, name, None)
|
||||
if number is not None:
|
||||
signal.signal(number, signal.SIG_DFL)
|
||||
|
||||
|
||||
def set_parent_death_signal(stop_signal) -> bool:
|
||||
"""Ask the kernel to send ``stop_signal`` when this process's parent dies.
|
||||
|
||||
Uses Linux ``prctl(PR_SET_PDEATHSIG)`` so an orphaned manager or companion
|
||||
is signalled the moment its parent goes away, rather than lingering. Returns
|
||||
True when armed and False on any non-Linux platform or error, so callers can
|
||||
fall back to polling ``os.getppid()``.
|
||||
"""
|
||||
if not sys.platform.startswith("linux"):
|
||||
return False
|
||||
try:
|
||||
libc = ctypes.CDLL("libc.so.6", use_errno=True)
|
||||
return libc.prctl(PR_SET_PDEATHSIG, int(stop_signal), 0, 0, 0) == 0
|
||||
except (OSError, AttributeError):
|
||||
return False
|
||||
|
||||
|
||||
class CompanionManager:
|
||||
"""Forks and supervises companion processes.
|
||||
|
||||
Created by the arbiter after preload. Holds one ``CompanionProcess`` per
|
||||
configured companion and owns the fork lifecycle. This skeleton wires
|
||||
construction and single-companion spawn; reaping, backoff, the control
|
||||
socket, and the run loop arrive in later tasks.
|
||||
"""
|
||||
|
||||
def __init__(self, configs: Iterable[CompanionConfig], log):
|
||||
self.log = log
|
||||
self.pid = os.getpid()
|
||||
self.processes = {c.name: CompanionProcess(c) for c in configs}
|
||||
# Set by the arbiter wiring: a no-arg callable that re-reads and
|
||||
# validates companion config, returning a fresh CompanionConfig list.
|
||||
self.config_loader = None
|
||||
# Set by the arbiter wiring: the ControlServer, or None when no control
|
||||
# socket is configured. Created inside the child by run().
|
||||
self.control = None
|
||||
self.stopping = False
|
||||
self._wakeup_pipe = None
|
||||
self.parent_pid = None
|
||||
|
||||
def run(self) -> None:
|
||||
"""Run the manager's supervision loop. This is the forked child body.
|
||||
|
||||
Installs signal handling, brings up the control socket, starts every
|
||||
companion, then loops servicing the socket and the companions until a
|
||||
SIGTERM or SIGINT asks it to stop, at which point it shuts the
|
||||
companions down and returns. Each tick reaps exited companions,
|
||||
retries any that are backing off, promotes those past ``startsecs``,
|
||||
and kills any that overran their stop deadline.
|
||||
|
||||
If the arbiter dies, the manager stops too: it arms a parent-death
|
||||
signal on Linux and, as a portable fallback, watches ``getppid`` each
|
||||
tick so it never keeps companions running under a dead arbiter.
|
||||
"""
|
||||
# __init__ ran in the arbiter, so refresh pid/parent now that this is
|
||||
# the manager process: the companion parent-death guard compares its
|
||||
# getppid() against self.pid.
|
||||
self.pid = os.getpid()
|
||||
self.parent_pid = os.getppid()
|
||||
self._install_signals()
|
||||
set_parent_death_signal(signal.SIGTERM)
|
||||
if self.control is not None:
|
||||
self.control.create()
|
||||
for process in self.processes.values():
|
||||
self.spawn_process(process)
|
||||
self.log.info("companion manager running (pid %s)", self.pid)
|
||||
try:
|
||||
while not self.stopping:
|
||||
if self._parent_gone():
|
||||
self.log.info("companion manager parent gone, stopping")
|
||||
break
|
||||
self._tick()
|
||||
self._wait()
|
||||
self.stop_all()
|
||||
finally:
|
||||
if self.control is not None:
|
||||
self.control.close()
|
||||
self.log.info("companion manager stopped (pid %s)", self.pid)
|
||||
|
||||
def _parent_gone(self) -> bool:
|
||||
"""True once the arbiter that forked the manager has exited."""
|
||||
return os.getppid() != self.parent_pid
|
||||
|
||||
def _tick(self, now: float = None) -> None:
|
||||
"""One supervision pass over every companion."""
|
||||
now = now or time.time()
|
||||
self.reap_processes()
|
||||
self.retry_backoff(now)
|
||||
self.promote_running(now)
|
||||
self.enforce_deadlines(now)
|
||||
|
||||
def _wait(self, timeout: float = 1.0) -> None:
|
||||
"""Block until a signal or a control request arrives, or we time out.
|
||||
|
||||
The self-pipe carries signal wake-ups; the control listener carries
|
||||
client connections. Either readable (or the timeout) ends the wait, so
|
||||
the next loop pass reacts promptly without busy-spinning.
|
||||
"""
|
||||
readers = [self._wakeup_pipe[0]]
|
||||
if self.control is not None and self.control.listener is not None:
|
||||
readers.append(self.control.listener)
|
||||
try:
|
||||
ready, _, _ = select.select(readers, [], [], timeout)
|
||||
except (InterruptedError, OSError):
|
||||
return
|
||||
for reader in ready:
|
||||
if reader is self._wakeup_pipe[0]:
|
||||
self._drain_wakeup()
|
||||
else:
|
||||
self._accept_control()
|
||||
|
||||
def _accept_control(self) -> None:
|
||||
"""Accept one waiting control connection and answer its requests."""
|
||||
try:
|
||||
connection, _ = self.control.listener.accept()
|
||||
except OSError:
|
||||
return
|
||||
self.control.serve_connection(connection)
|
||||
|
||||
def enforce_deadlines(self, now: float = None) -> None:
|
||||
"""SIGKILL companions that overran their stop deadline.
|
||||
|
||||
A graceful ``stop_signal`` may be ignored or take too long; once the
|
||||
deadline set by stop/restart passes, the companion is force-killed so
|
||||
it cannot wedge the manager. The reaper picks up the exit afterwards.
|
||||
"""
|
||||
now = now or time.time()
|
||||
for process in self.processes.values():
|
||||
if process.state != State.STOPPING or process.stop_deadline is None:
|
||||
continue
|
||||
if now >= process.stop_deadline and process.pid is not None:
|
||||
self._safe_kill(process.pid, signal.SIGKILL)
|
||||
process.kill_count += 1
|
||||
process.stop_deadline = None
|
||||
self.log.warning("companion %s killed after timeout (pid %s)",
|
||||
process.name, process.pid)
|
||||
|
||||
def stop_all(self) -> None:
|
||||
"""Stop every companion and reap them as they exit.
|
||||
|
||||
Sends each one its stop signal, then keeps reaping and enforcing stop
|
||||
deadlines until they are all gone, so the manager exits without leaving
|
||||
orphaned companions behind.
|
||||
"""
|
||||
self.log.info("stopping all companions")
|
||||
for name in list(self.processes):
|
||||
self.stop_process(name)
|
||||
while any(process.pid is not None for process in self.processes.values()):
|
||||
now = time.time()
|
||||
self.enforce_deadlines(now)
|
||||
self.reap_processes()
|
||||
self._wait(timeout=0.2)
|
||||
self.log.info("all companions stopped")
|
||||
|
||||
def _install_signals(self) -> None:
|
||||
"""Set up the self-pipe and signal handlers for the supervision loop."""
|
||||
self._wakeup_pipe = os.pipe()
|
||||
for fd in self._wakeup_pipe:
|
||||
util.set_non_blocking(fd)
|
||||
util.close_on_exec(fd)
|
||||
signal.signal(signal.SIGCHLD, self._wakeup)
|
||||
signal.signal(signal.SIGTERM, self._signal_stop)
|
||||
signal.signal(signal.SIGINT, self._signal_stop)
|
||||
|
||||
def _signal_stop(self, signum, frame) -> None:
|
||||
self.stopping = True
|
||||
self._wakeup()
|
||||
|
||||
def _wakeup(self, signum=None, frame=None) -> None:
|
||||
"""Wake the loop out of ``select`` by writing to the self-pipe."""
|
||||
try:
|
||||
os.write(self._wakeup_pipe[1], b".")
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
def _drain_wakeup(self) -> None:
|
||||
try:
|
||||
while os.read(self._wakeup_pipe[0], 4096):
|
||||
pass
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
def handle_command(self, command: dict) -> dict:
|
||||
"""Route a decoded control command to its action.
|
||||
|
||||
This is the ``dispatch`` the control socket calls. ``status`` returns a
|
||||
snapshot of every companion; ``start``/``stop``/``restart`` act on the
|
||||
one named companion and report ``(ok, message)``. Per-companion
|
||||
commands need a string ``name``, and anything else raises ``CommandError`` so the
|
||||
socket replies with an error envelope.
|
||||
"""
|
||||
command_name = command["cmd"]
|
||||
if command_name == "status":
|
||||
return {"ok": True, "companions": self.status()}
|
||||
if command_name == "reread":
|
||||
if self.config_loader is None:
|
||||
raise CommandError("reread not configured")
|
||||
try:
|
||||
new_configs = self.config_loader()
|
||||
except Exception as error:
|
||||
return {"ok": False, "error": "invalid config: %s" % error,
|
||||
"kept_old_config": True}
|
||||
return self.reread_config(new_configs)
|
||||
|
||||
# Every remaining command acts on one named companion.
|
||||
name = command.get("name")
|
||||
if not isinstance(name, str):
|
||||
raise CommandError("'%s' requires a 'name'" % command_name)
|
||||
if command_name == "start":
|
||||
ok, message = self.start_process(name)
|
||||
elif command_name == "stop":
|
||||
ok, message = self.stop_process(name)
|
||||
elif command_name == "restart":
|
||||
ok, message = self.restart_process(name)
|
||||
else:
|
||||
raise CommandError("unknown command %r" % command_name)
|
||||
return {"ok": ok, "message": message}
|
||||
|
||||
def status(self, now: float = None) -> list:
|
||||
"""Status entry for every companion, for the ``status`` command."""
|
||||
now = now or time.time()
|
||||
return [process.status_dict(now) for process in self.processes.values()]
|
||||
|
||||
def reread_config(self, new_configs) -> dict:
|
||||
"""Transactionally apply a fresh set of companion configs.
|
||||
|
||||
Each companion is compared with the running set by ``config_hash``:
|
||||
a new name is added and started, a missing name is stopped and removed,
|
||||
a changed hash stores the new config and restarts (unless the companion
|
||||
was manually stopped, which keeps it STOPPED with the new config ready
|
||||
for its next start), and an unchanged hash is left alone. Validation
|
||||
runs first, so a bad config touches nothing and the old one stays live.
|
||||
"""
|
||||
try:
|
||||
new_by_name = self._index_configs(new_configs)
|
||||
except CommandError as e:
|
||||
return {"ok": False, "error": str(e), "kept_old_config": True}
|
||||
|
||||
added, removed, restarted, unchanged = [], [], [], []
|
||||
old_names = set(self.processes)
|
||||
new_names = set(new_by_name)
|
||||
|
||||
for name in old_names - new_names:
|
||||
self.stop_process(name)
|
||||
del self.processes[name]
|
||||
removed.append(name)
|
||||
|
||||
for name in new_names - old_names:
|
||||
process = CompanionProcess(new_by_name[name])
|
||||
self.processes[name] = process
|
||||
self.spawn_process(process)
|
||||
added.append(name)
|
||||
|
||||
for name in new_names & old_names:
|
||||
process = self.processes[name]
|
||||
if process.config.config_hash == new_by_name[name].config_hash:
|
||||
unchanged.append(name)
|
||||
continue
|
||||
process.config = new_by_name[name]
|
||||
if process.manual_stop:
|
||||
unchanged.append(name)
|
||||
else:
|
||||
self.restart_process(name)
|
||||
restarted.append(name)
|
||||
|
||||
self.log.info(
|
||||
"companion reread applied: added %s, removed %s, restarted %s, unchanged %s",
|
||||
added, removed, restarted, unchanged)
|
||||
return {"ok": True, "added": added, "removed": removed,
|
||||
"restarted": restarted, "unchanged": unchanged}
|
||||
|
||||
@staticmethod
|
||||
def _index_configs(configs) -> dict:
|
||||
"""Index configs by name, rejecting duplicates."""
|
||||
by_name = {}
|
||||
for config in configs:
|
||||
if config.name in by_name:
|
||||
raise CommandError(
|
||||
"invalid config: duplicate companion name %s" % config.name)
|
||||
by_name[config.name] = config
|
||||
return by_name
|
||||
|
||||
def spawn_process(self, process: CompanionProcess) -> int:
|
||||
"""Fork one companion.
|
||||
|
||||
Parent records the pid and moves the companion to STARTING. Child
|
||||
resolves and runs the target, exiting the worker on any failure so a
|
||||
crashed companion never leaks back into the manager's control flow.
|
||||
|
||||
Spawning is policy-neutral: it does not touch ``manual_stop``. Clearing
|
||||
that flag is the job of the commands that intentionally bring a
|
||||
companion back (:meth:`start_process`, :meth:`restart_process`), and a
|
||||
companion only ever reaches a respawn path with the flag already false.
|
||||
"""
|
||||
pid = os.fork()
|
||||
if pid != 0:
|
||||
process.pid = pid
|
||||
process.state = State.STARTING
|
||||
process.started_at = time.time()
|
||||
self.log.info("companion %s started (pid %s)", process.name, pid)
|
||||
return pid
|
||||
|
||||
try:
|
||||
self._close_manager_fds()
|
||||
reset_child_signals()
|
||||
set_parent_death_signal(signal.SIGTERM)
|
||||
if os.getppid() != self.pid:
|
||||
# Manager already died between fork and arming: do not run.
|
||||
os._exit(0)
|
||||
self._apply_environment(process.config)
|
||||
self._redirect_output(process.config)
|
||||
target = self._resolve_target(process.config.target)
|
||||
target()
|
||||
except SystemExit:
|
||||
raise
|
||||
except BaseException:
|
||||
self.log.exception("companion %s crashed", process.name)
|
||||
os._exit(1)
|
||||
os._exit(0)
|
||||
|
||||
def _close_manager_fds(self) -> None:
|
||||
"""Close the manager's own fds in a freshly forked companion.
|
||||
|
||||
A companion inherits the manager's control socket and wakeup pipe but
|
||||
must not keep them: an open listener would let a companion answer
|
||||
control requests, and the pipe is the manager's private signal path.
|
||||
Both are closed before the target runs.
|
||||
"""
|
||||
if self.control is not None and self.control.listener is not None:
|
||||
self.control.listener.close()
|
||||
if self._wakeup_pipe is not None:
|
||||
for fd in self._wakeup_pipe:
|
||||
try:
|
||||
os.close(fd)
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
def start_process(self, name: str):
|
||||
"""Start a companion by name (the control ``start`` command).
|
||||
|
||||
Follows the supervisor-style rules: a STOPPED or BACKOFF companion
|
||||
clears its ``manual_stop`` flag, drops any pending retry, and is spawned
|
||||
right away. RUNNING and STARTING are already-up, so they report success
|
||||
without doing anything. STOPPING is rejected so the caller polls status
|
||||
and retries once the old child is gone. Returns ``(ok, message)``.
|
||||
"""
|
||||
process = self.processes.get(name)
|
||||
if process is None:
|
||||
return False, "unknown companion %s" % name
|
||||
if process.state in (State.RUNNING, State.STARTING):
|
||||
return True, "%s already %s" % (name, process.state.lower())
|
||||
if process.state == State.STOPPING:
|
||||
return False, "%s is stopping; retry" % name
|
||||
process.manual_stop = False
|
||||
process.next_retry_at = None
|
||||
self.spawn_process(process)
|
||||
return True, "%s started" % name
|
||||
|
||||
def stop_process(self, name: str, now: float = None):
|
||||
"""Stop a companion by name (the control ``stop`` command).
|
||||
|
||||
Sets ``manual_stop`` so the companion will not auto-restart. A live
|
||||
companion (RUNNING or STARTING) is sent its ``stop_signal`` and moved
|
||||
to STOPPING with a ``stop_deadline``; the run loop reaps it, or SIGKILLs
|
||||
it once the deadline passes. BACKOFF just cancels the pending retry and
|
||||
settles in STOPPED. STOPPED and STOPPING are already-there success
|
||||
no-ops. Returns ``(ok, message)``.
|
||||
"""
|
||||
process = self.processes.get(name)
|
||||
if process is None:
|
||||
return False, "unknown companion %s" % name
|
||||
process.manual_stop = True
|
||||
# A stop must win over an in-flight restart: clearing this keeps
|
||||
# handle_exit from respawning a companion the user asked to stop.
|
||||
process.restart_pending = False
|
||||
if process.state in (State.STOPPED, State.STOPPING):
|
||||
return True, "%s already %s" % (name, process.state.lower())
|
||||
if process.state == State.BACKOFF:
|
||||
process.next_retry_at = None
|
||||
process.state = State.STOPPED
|
||||
return True, "%s stopped" % name
|
||||
now = now or time.time()
|
||||
self._safe_kill(process.pid, self._signal_number(process.config.stop_signal))
|
||||
process.state = State.STOPPING
|
||||
process.stop_deadline = now + process.config.stop_timeout
|
||||
self.log.info("companion %s stopping (pid %s)", name, process.pid)
|
||||
return True, "%s stopping" % name
|
||||
|
||||
def restart_process(self, name: str, now: float = None):
|
||||
"""Restart a companion by name (the control ``restart`` command).
|
||||
|
||||
Always clears ``manual_stop`` so the companion comes back. A live
|
||||
companion (RUNNING or STARTING) is asked to stop -- it goes STOPPING
|
||||
with ``restart_pending`` set and a deadline based on ``reload_timeout``,
|
||||
and the reaper respawns it as soon as the old child exits. BACKOFF and
|
||||
STOPPED start again immediately. STOPPING is rejected so the caller
|
||||
retries. This never rereads config. Returns ``(ok, message)``.
|
||||
"""
|
||||
process = self.processes.get(name)
|
||||
if process is None:
|
||||
return False, "unknown companion %s" % name
|
||||
if process.state == State.STOPPING:
|
||||
return False, "%s is stopping; retry" % name
|
||||
process.manual_stop = False
|
||||
if process.state in (State.RUNNING, State.STARTING):
|
||||
now = now or time.time()
|
||||
process.restart_pending = True
|
||||
self._safe_kill(process.pid, self._signal_number(process.config.stop_signal))
|
||||
process.state = State.STOPPING
|
||||
process.stop_deadline = now + process.config.reload_timeout
|
||||
self.log.info("companion %s restarting (pid %s)", name, process.pid)
|
||||
return True, "%s restarting" % name
|
||||
process.next_retry_at = None
|
||||
self.spawn_process(process)
|
||||
return True, "%s started" % name
|
||||
|
||||
@staticmethod
|
||||
def _safe_kill(pid: int, sig) -> None:
|
||||
"""Send ``sig`` to ``pid``, ignoring an already-dead target.
|
||||
|
||||
A companion can exit between the manager deciding to signal it and the
|
||||
kill itself (the exit is only reaped on the next tick). Without this the
|
||||
resulting ``ProcessLookupError`` would escape and take the manager down.
|
||||
"""
|
||||
try:
|
||||
os.kill(pid, sig)
|
||||
except ProcessLookupError:
|
||||
pass
|
||||
|
||||
@staticmethod
|
||||
def _signal_number(stop_signal) -> int:
|
||||
"""Resolve a stop signal to its number, e.g. ``"SIGTERM"`` -> 15.
|
||||
|
||||
Accepts a signal name or a raw number and validates both against the
|
||||
real signal table, so a typo like ``"SIGTRM"`` fails loudly here rather
|
||||
than silently sending the wrong signal (or none).
|
||||
"""
|
||||
try:
|
||||
if isinstance(stop_signal, str):
|
||||
return signal.Signals[stop_signal]
|
||||
return signal.Signals(stop_signal)
|
||||
except (KeyError, ValueError):
|
||||
raise ValueError("unknown stop signal %r" % (stop_signal,))
|
||||
|
||||
def reap_processes(self) -> list:
|
||||
"""Reap any companions that have exited and record their exit info.
|
||||
|
||||
A companion runs as a forked child of the manager, so when it dies the
|
||||
kernel hands its exit status back to us as a zombie until we collect it
|
||||
with ``waitpid``. This method does that collecting, and is meant to be
|
||||
called once per run-loop tick (typically after a ``SIGCHLD``).
|
||||
|
||||
``waitpid(-1, WNOHANG)`` asks the kernel for any one dead child without
|
||||
blocking. It returns ``(pid, status)`` for a child it reaped, or
|
||||
``(0, 0)`` when children are still alive but none have exited. Several
|
||||
companions can die between two ticks, so we loop until one of those two
|
||||
stop conditions is hit: ``(0, 0)`` (nothing more to reap right now) or
|
||||
``ChildProcessError`` (no child processes exist at all).
|
||||
|
||||
For each reaped pid we look up its companion, then in order: record the
|
||||
exit (signal or code, time, count), free the pid, and move it to its
|
||||
next public state via :meth:`handle_exit` -- STOPPED if it was stopped
|
||||
on purpose, otherwise BACKOFF for a later restart. Pids we don't
|
||||
recognise are ignored. Returns the list of companions reaped this call.
|
||||
"""
|
||||
reaped = []
|
||||
while True:
|
||||
try:
|
||||
pid, status = os.waitpid(-1, os.WNOHANG)
|
||||
except ChildProcessError:
|
||||
break
|
||||
if pid == 0:
|
||||
break
|
||||
process = self._process_by_pid(pid)
|
||||
if process is not None:
|
||||
self._record_exit(process, status)
|
||||
self._log_exit(process)
|
||||
self.handle_exit(process)
|
||||
reaped.append(process)
|
||||
return reaped
|
||||
|
||||
def _log_exit(self, process: CompanionProcess) -> None:
|
||||
"""Log how a reaped companion exited, before its fate is decided."""
|
||||
if process.last_exit_signal is not None:
|
||||
self.log.info("companion %s exited on signal %s",
|
||||
process.name, process.last_exit_signal)
|
||||
else:
|
||||
self.log.info("companion %s exited with status %s",
|
||||
process.name, process.last_exit_code)
|
||||
|
||||
def handle_exit(self, process: CompanionProcess, now: float = None) -> None:
|
||||
"""Decide a companion's fate after it exits: restart, stop, or back off.
|
||||
|
||||
A pending restart wins: the old child was asked to stop only so a fresh
|
||||
one could take its place, so it is respawned immediately. Otherwise a
|
||||
companion that was stopped on purpose settles in STOPPED and stays
|
||||
there, and any other exit is unexpected, so it enters BACKOFF and is
|
||||
scheduled to restart after a fixed ``restart_delay`` (no exponential
|
||||
backoff, no retry cap).
|
||||
"""
|
||||
now = now or time.time()
|
||||
if process.restart_pending:
|
||||
process.restart_pending = False
|
||||
process.restart_count += 1
|
||||
self.log.info("companion %s restarting", process.name)
|
||||
self.spawn_process(process)
|
||||
return
|
||||
if process.manual_stop:
|
||||
process.state = State.STOPPED
|
||||
process.next_retry_at = None
|
||||
self.log.info("companion %s stopped", process.name)
|
||||
return
|
||||
process.state = State.BACKOFF
|
||||
process.next_retry_at = now + process.restart_delay
|
||||
self.log.info("companion %s backing off, retrying in %ss",
|
||||
process.name, process.restart_delay)
|
||||
|
||||
def retry_backoff(self, now: float = None) -> list:
|
||||
"""Respawn BACKOFF companions whose fixed retry delay has elapsed.
|
||||
|
||||
Each retry bumps ``restart_count`` and re-forks the companion, which
|
||||
puts it back into STARTING. Returns the companions that were retried.
|
||||
"""
|
||||
now = now or time.time()
|
||||
retried = []
|
||||
for process in self.processes.values():
|
||||
if process.state != State.BACKOFF or process.next_retry_at is None:
|
||||
continue
|
||||
if now >= process.next_retry_at:
|
||||
process.restart_count += 1
|
||||
process.next_retry_at = None
|
||||
self.spawn_process(process)
|
||||
retried.append(process)
|
||||
return retried
|
||||
|
||||
def promote_running(self, now: float = None) -> list:
|
||||
"""Move companions that survived ``startsecs`` from STARTING to RUNNING.
|
||||
|
||||
A freshly spawned companion starts in STARTING. If it stays alive for
|
||||
its ``startsecs`` window it is considered up and becomes RUNNING; if it
|
||||
dies first, reaping handles it instead. Returns the promoted ones.
|
||||
"""
|
||||
now = now or time.time()
|
||||
promoted = []
|
||||
for process in self.processes.values():
|
||||
if process.state != State.STARTING or process.started_at is None:
|
||||
continue
|
||||
if now - process.started_at >= process.config.startsecs:
|
||||
process.state = State.RUNNING
|
||||
self.log.info("companion %s running (pid %s)", process.name, process.pid)
|
||||
promoted.append(process)
|
||||
return promoted
|
||||
|
||||
def _process_by_pid(self, pid: int):
|
||||
for process in self.processes.values():
|
||||
if process.pid == pid:
|
||||
return process
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def _record_exit(process: CompanionProcess, status: int) -> None:
|
||||
"""Store how a companion died: signal number or exit code, plus time.
|
||||
|
||||
``status`` is the packed value from ``waitpid``. ``WIFSIGNALED`` tells
|
||||
us a signal killed it, in which case ``WTERMSIG`` gives the signal
|
||||
number; otherwise it exited normally and ``WEXITSTATUS`` gives its exit
|
||||
code. Only one of the two is ever set, so the other is cleared.
|
||||
"""
|
||||
if os.WIFSIGNALED(status):
|
||||
process.last_exit_signal = os.WTERMSIG(status)
|
||||
process.last_exit_code = None
|
||||
else:
|
||||
process.last_exit_code = os.WEXITSTATUS(status)
|
||||
process.last_exit_signal = None
|
||||
process.exited_at = time.time()
|
||||
process.exit_count += 1
|
||||
process.pid = None
|
||||
|
||||
@staticmethod
|
||||
def _apply_environment(config: CompanionConfig) -> None:
|
||||
"""Apply ``cwd`` and ``env`` in the child before running the target.
|
||||
|
||||
cwd is changed first so a relative path in env (or the target itself)
|
||||
resolves against it. env is merged onto the inherited environment, not
|
||||
replaced, so the companion keeps the manager's variables.
|
||||
"""
|
||||
if config.cwd:
|
||||
os.chdir(config.cwd)
|
||||
if config.env:
|
||||
os.environ.update(config.env)
|
||||
|
||||
@staticmethod
|
||||
def _redirect_output(config: CompanionConfig) -> None:
|
||||
"""Send the companion's stdout and stderr to its configured log files.
|
||||
|
||||
By default a companion just inherits the manager's stdout/stderr, so
|
||||
leaving these unset (or ``"inherit"``) keeps that. Give a file path and
|
||||
we append the output there instead. For stderr you can also pass
|
||||
``"stdout"`` to fold the two streams into one file.
|
||||
"""
|
||||
stdout_fd = CompanionManager._open_output(config.stdout)
|
||||
if stdout_fd is not None:
|
||||
os.dup2(stdout_fd, 1)
|
||||
os.close(stdout_fd)
|
||||
if config.stderr == "stdout":
|
||||
os.dup2(1, 2)
|
||||
else:
|
||||
stderr_fd = CompanionManager._open_output(config.stderr)
|
||||
if stderr_fd is not None:
|
||||
os.dup2(stderr_fd, 2)
|
||||
os.close(stderr_fd)
|
||||
|
||||
@staticmethod
|
||||
def _open_output(value):
|
||||
"""Open one log file for writing, or return None to leave the stream
|
||||
as-is when the companion should keep inheriting it."""
|
||||
if value in (None, "inherit"):
|
||||
return None
|
||||
return os.open(value, os.O_WRONLY | os.O_CREAT | os.O_APPEND, 0o644)
|
||||
|
||||
@staticmethod
|
||||
def _resolve_target(target: Union[Callable, str]) -> Callable:
|
||||
"""Return the zero-arg callable for a companion target.
|
||||
|
||||
Accepts an already-callable target or a ``"module:attr"`` import
|
||||
string, e.g. ``"frappe_companions:start_rq_default"``.
|
||||
"""
|
||||
if callable(target):
|
||||
return target
|
||||
module_name, separator, attribute = target.partition(":")
|
||||
if not separator:
|
||||
raise ValueError("companion target %r must be 'module:callable'" % target)
|
||||
return getattr(importlib.import_module(module_name), attribute)
|
||||
117
gunicorn/companion/process.py
Normal file
117
gunicorn/companion/process.py
Normal file
@ -0,0 +1,117 @@
|
||||
#
|
||||
# This file is part of gunicorn released under the MIT license.
|
||||
# See the NOTICE for more information.
|
||||
|
||||
|
||||
import time
|
||||
from enum import Enum
|
||||
|
||||
from gunicorn.util import format_uptime
|
||||
|
||||
|
||||
class State(str, Enum):
|
||||
"""Public states, mimicking ``supervisorctl status``.
|
||||
|
||||
The manager never exposes EXITED/FATAL/UNKNOWN; an exited companion is
|
||||
either STOPPED (manual) or BACKOFF (waiting to restart). Members subclass
|
||||
``str`` so they compare and JSON-serialize as their plain value.
|
||||
"""
|
||||
|
||||
STOPPED = "STOPPED"
|
||||
STARTING = "STARTING"
|
||||
RUNNING = "RUNNING"
|
||||
BACKOFF = "BACKOFF"
|
||||
STOPPING = "STOPPING"
|
||||
|
||||
|
||||
class CompanionProcess:
|
||||
"""Runtime state for one companion, separate from its static config.
|
||||
|
||||
Holds everything ``status`` needs: current public state, live pid, restart
|
||||
and exit counters, last exit info, and the ``manual_stop`` flag that keeps a
|
||||
user-stopped companion from auto-restarting.
|
||||
"""
|
||||
|
||||
def __init__(self, config):
|
||||
self.config = config
|
||||
self.state = State.STOPPED
|
||||
self.pid = None
|
||||
self.restart_delay = config.restart_delay
|
||||
|
||||
self.started_at = None
|
||||
self.exited_at = None
|
||||
self.next_retry_at = None
|
||||
self.stop_deadline = None
|
||||
|
||||
self.restart_count = 0
|
||||
self.exit_count = 0
|
||||
self.kill_count = 0
|
||||
|
||||
self.last_exit_code = None
|
||||
self.last_exit_signal = None
|
||||
|
||||
self.manual_stop = False
|
||||
self.restart_pending = False
|
||||
|
||||
@property
|
||||
def name(self):
|
||||
return self.config.name
|
||||
|
||||
def uptime(self, now=None):
|
||||
"""Seconds since this companion last started, or ``None`` if not up."""
|
||||
if self.state not in (State.RUNNING, State.STARTING) or self.started_at is None:
|
||||
return None
|
||||
return (now or time.time()) - self.started_at
|
||||
|
||||
def description(self, now=None):
|
||||
"""Human one-liner: state label plus runtime details."""
|
||||
now = now or time.time()
|
||||
label = self.state.lower()
|
||||
detail = self._detail(now)
|
||||
return "%s, %s" % (label, detail) if detail else label
|
||||
|
||||
def _detail(self, now):
|
||||
if self.state == State.RUNNING:
|
||||
return "pid %s, uptime %s" % (
|
||||
self.pid,
|
||||
format_uptime(self.uptime(now) or 0),
|
||||
)
|
||||
if self.state == State.BACKOFF:
|
||||
return self._backoff_detail(now)
|
||||
if self.state == State.STOPPED:
|
||||
return self._stopped_detail()
|
||||
return ""
|
||||
|
||||
def _backoff_detail(self, now):
|
||||
if self.next_retry_at is not None:
|
||||
seconds_left = max(0, int(self.next_retry_at - now))
|
||||
return "exited with %s, retrying in %ds" % (self._exit_status(), seconds_left)
|
||||
return "exited with %s" % self._exit_status()
|
||||
|
||||
def _stopped_detail(self):
|
||||
if self.manual_stop:
|
||||
return "stopped manually"
|
||||
if self.exited_at is not None:
|
||||
return "exited with %s" % self._exit_status()
|
||||
return "not started"
|
||||
|
||||
def _exit_status(self):
|
||||
if self.last_exit_signal is not None:
|
||||
return "signal %s" % self.last_exit_signal
|
||||
return "status %s" % self.last_exit_code
|
||||
|
||||
def status_dict(self, now=None):
|
||||
"""Machine-readable status entry for the JSON control protocol."""
|
||||
backoff = self.state == State.BACKOFF
|
||||
return {
|
||||
"name": self.name,
|
||||
"state": self.state,
|
||||
"pid": self.pid,
|
||||
"description": self.description(now or time.time()),
|
||||
"next_retry_at": self.next_retry_at if backoff else None,
|
||||
"restart_delay": self.restart_delay if backoff else None,
|
||||
"last_exit_code": self.last_exit_code if backoff else None,
|
||||
}
|
||||
|
||||
def __repr__(self):
|
||||
return "<CompanionProcess %s %s>" % (self.name, self.state)
|
||||
@ -354,6 +354,31 @@ def validate_dict(val):
|
||||
return val
|
||||
|
||||
|
||||
def validate_pos_int_or_none(val):
|
||||
if val is None:
|
||||
return None
|
||||
return validate_pos_int(val)
|
||||
|
||||
|
||||
def validate_companion_workers(val):
|
||||
if val is None:
|
||||
return []
|
||||
if not isinstance(val, list):
|
||||
raise TypeError("companion_workers must be a list: %s" % val)
|
||||
for item in val:
|
||||
if not isinstance(item, dict):
|
||||
raise TypeError("each companion worker must be a dict: %s" % item)
|
||||
return val
|
||||
|
||||
|
||||
def validate_octal_or_none(val):
|
||||
if val is None:
|
||||
return None
|
||||
if isinstance(val, str):
|
||||
val = int(val, 8)
|
||||
return int(val)
|
||||
|
||||
|
||||
def validate_pos_int(val):
|
||||
if not isinstance(val, int):
|
||||
val = int(val, 0)
|
||||
@ -2502,3 +2527,199 @@ class HeaderMap(Setting):
|
||||
|
||||
.. versionadded:: 22.0.0
|
||||
"""
|
||||
|
||||
|
||||
class CompanionConfigFile(Setting):
|
||||
name = "companion_config_file"
|
||||
section = "Companion Processes"
|
||||
cli = ["--companion-config"]
|
||||
validator = validate_string
|
||||
default = None
|
||||
desc = """\
|
||||
Dedicated Python file the manager reads for companion specs.
|
||||
|
||||
Lets companions reload independently of the main Gunicorn config.
|
||||
Example: ``companion_config_file = "/home/frappe/bench/companion.conf.py"``.
|
||||
If unset, specs are read from the main Gunicorn config.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionWorkers(Setting):
|
||||
name = "companion_workers"
|
||||
section = "Companion Processes"
|
||||
validator = validate_companion_workers
|
||||
default = []
|
||||
desc = """\
|
||||
List of companion process specs, each a dict.
|
||||
|
||||
A spec requires ``name`` and ``target``; other keys override defaults.
|
||||
Example: ``[{"name": "scheduler", "target": "app:run_scheduler"}]``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionControlSocket(Setting):
|
||||
name = "companion_control_socket"
|
||||
section = "Companion Processes"
|
||||
cli = ["--companion-control-socket"]
|
||||
validator = validate_string
|
||||
default = None
|
||||
desc = """\
|
||||
Unix socket the manager listens on for control commands.
|
||||
|
||||
Clients send ``status``/``start``/``stop``/``restart``/``reread`` as JSON.
|
||||
Example: ``companion_control_socket = "/run/gunicorn/companion.sock"``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionControlSocketMode(Setting):
|
||||
name = "companion_control_socket_mode"
|
||||
section = "Companion Processes"
|
||||
validator = validate_octal_or_none
|
||||
default = 0o600
|
||||
desc = """\
|
||||
Octal file mode set on the control socket after it is created.
|
||||
|
||||
Default ``0o600`` lets only the owner connect.
|
||||
Example: ``companion_control_socket_mode = 0o660``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionStopSignal(Setting):
|
||||
name = "companion_stop_signal"
|
||||
section = "Companion Processes"
|
||||
validator = validate_string
|
||||
default = "SIGTERM"
|
||||
desc = """\
|
||||
Signal sent first when stopping a companion.
|
||||
|
||||
The companion should catch it to drain work before exiting.
|
||||
Example: ``companion_stop_signal = "SIGINT"``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionStopTimeout(Setting):
|
||||
name = "companion_stop_timeout"
|
||||
section = "Companion Processes"
|
||||
validator = validate_pos_int
|
||||
default = 60
|
||||
desc = """\
|
||||
Seconds to wait after the stop signal before sending SIGKILL.
|
||||
|
||||
Give slow-draining workers a larger value.
|
||||
Example: ``companion_stop_timeout = 300``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionReloadTimeout(Setting):
|
||||
name = "companion_reload_timeout"
|
||||
section = "Companion Processes"
|
||||
validator = validate_pos_int
|
||||
default = 60
|
||||
desc = """\
|
||||
Seconds to wait for the old child to exit during restart/reread.
|
||||
|
||||
Used instead of ``stop_timeout`` when replacing a companion.
|
||||
Example: ``companion_reload_timeout = 30``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionStdout(Setting):
|
||||
name = "companion_stdout"
|
||||
section = "Companion Processes"
|
||||
validator = validate_string
|
||||
default = None
|
||||
desc = """\
|
||||
Where to send companion stdout: a file path or ``"inherit"``.
|
||||
|
||||
Files are opened in append mode; ``None`` inherits the manager's stdout.
|
||||
Example: ``companion_stdout = "/var/log/frappe/rq.log"``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionStderr(Setting):
|
||||
name = "companion_stderr"
|
||||
section = "Companion Processes"
|
||||
validator = validate_string
|
||||
default = None
|
||||
desc = """\
|
||||
Where to send companion stderr: a path, ``"stdout"``, or ``"inherit"``.
|
||||
|
||||
Use ``"stdout"`` to merge stderr into the stdout target.
|
||||
Example: ``companion_stderr = "stdout"``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionCwd(Setting):
|
||||
name = "companion_cwd"
|
||||
section = "Companion Processes"
|
||||
validator = validate_string
|
||||
default = None
|
||||
desc = """\
|
||||
Directory the child changes into before running the target.
|
||||
|
||||
Example: ``companion_cwd = "/home/frappe/frappe-bench"``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionEnv(Setting):
|
||||
name = "companion_env"
|
||||
section = "Companion Processes"
|
||||
validator = validate_dict
|
||||
default = {}
|
||||
desc = """\
|
||||
Extra environment variables merged into the child environment.
|
||||
|
||||
Example: ``companion_env = {"QUEUE": "default"}``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionStartsecs(Setting):
|
||||
name = "companion_startsecs"
|
||||
section = "Companion Processes"
|
||||
validator = validate_pos_int
|
||||
default = 1
|
||||
desc = """\
|
||||
Seconds a freshly forked companion must survive to reach RUNNING.
|
||||
|
||||
Exiting earlier counts as a failed start and triggers BACKOFF.
|
||||
Example: ``companion_startsecs = 5``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionRestartDelay(Setting):
|
||||
name = "companion_restart_delay"
|
||||
section = "Companion Processes"
|
||||
validator = validate_pos_int
|
||||
default = 5
|
||||
desc = """\
|
||||
Fixed seconds to wait before restarting a crashed companion.
|
||||
|
||||
No exponential backoff; the same delay is used every time.
|
||||
Example: ``companion_restart_delay = 10``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionManagerShutdownBuffer(Setting):
|
||||
name = "companion_manager_shutdown_buffer"
|
||||
section = "Companion Processes"
|
||||
validator = validate_pos_int
|
||||
default = 10
|
||||
desc = """\
|
||||
Extra seconds added to the slowest companion timeout for the manager.
|
||||
|
||||
Gives the manager headroom to stop all children before Gunicorn kills it.
|
||||
Example: ``companion_manager_shutdown_buffer = 15``.
|
||||
"""
|
||||
|
||||
|
||||
class CompanionManagerStopTimeout(Setting):
|
||||
name = "companion_manager_stop_timeout"
|
||||
section = "Companion Processes"
|
||||
validator = validate_pos_int_or_none
|
||||
default = None
|
||||
desc = """\
|
||||
Seconds Gunicorn waits for the manager to stop during shutdown.
|
||||
|
||||
By default it uses the slowest companion ``stop_timeout`` plus a buffer.
|
||||
Example: ``companion_manager_stop_timeout = 120``.
|
||||
"""
|
||||
|
||||
@ -647,3 +647,15 @@ def bytes_to_str(b):
|
||||
|
||||
def unquote_to_wsgi_str(string):
|
||||
return urllib.parse.unquote_to_bytes(string).decode('latin-1')
|
||||
|
||||
|
||||
def format_uptime(seconds):
|
||||
"""Render a duration like supervisor: ``2 days, 03:12:44`` or ``0:05:12``."""
|
||||
seconds = int(seconds)
|
||||
days, remainder = divmod(seconds, 86400)
|
||||
hours, remainder = divmod(remainder, 3600)
|
||||
minutes, remaining_seconds = divmod(remainder, 60)
|
||||
if days:
|
||||
unit = "day" if days == 1 else "days"
|
||||
return "%d %s, %02d:%02d:%02d" % (days, unit, hours, minutes, remaining_seconds)
|
||||
return "%d:%02d:%02d" % (hours, minutes, remaining_seconds)
|
||||
|
||||
@ -66,6 +66,7 @@ testing = [
|
||||
[project.scripts]
|
||||
# duplicates "python -m gunicorn" handling in __main__.py
|
||||
gunicorn = "gunicorn.app.wsgiapp:run"
|
||||
gunicorn-companion = "gunicorn.companion.ctl:run"
|
||||
|
||||
# note the quotes around "paste.server_runner" to escape the dot
|
||||
[project.entry-points."paste.server_runner"]
|
||||
|
||||
@ -2,11 +2,15 @@
|
||||
# This file is part of gunicorn released under the MIT license.
|
||||
# See the NOTICE for more information.
|
||||
|
||||
import errno
|
||||
import os
|
||||
import signal
|
||||
import time
|
||||
from unittest import mock
|
||||
|
||||
import gunicorn.app.base
|
||||
import gunicorn.arbiter
|
||||
from gunicorn.companion.config import build_companion_configs
|
||||
from gunicorn.config import ReusePort
|
||||
|
||||
|
||||
@ -147,6 +151,244 @@ def test_arbiter_reap_workers(mock_os_waitpid):
|
||||
arbiter.cfg.child_exit.assert_called_with(arbiter, mock_worker)
|
||||
|
||||
|
||||
def test_arbiter_manage_companion_manager_spawns_when_configured():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.cfg.set("companion_workers", [{"name": "rq", "target": "pkg:run"}])
|
||||
arbiter.spawn_companion_manager = mock.Mock()
|
||||
arbiter.manage_companion_manager()
|
||||
arbiter.spawn_companion_manager.assert_called_once_with()
|
||||
|
||||
|
||||
def test_arbiter_manage_companion_manager_noop_without_companions():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.spawn_companion_manager = mock.Mock()
|
||||
arbiter.manage_companion_manager()
|
||||
arbiter.spawn_companion_manager.assert_not_called()
|
||||
|
||||
|
||||
def test_arbiter_manage_companion_manager_noop_when_running():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.cfg.set("companion_workers", [{"name": "rq", "target": "pkg:run"}])
|
||||
arbiter.companion_manager_pid = 4242
|
||||
arbiter.spawn_companion_manager = mock.Mock()
|
||||
arbiter.manage_companion_manager()
|
||||
arbiter.spawn_companion_manager.assert_not_called()
|
||||
|
||||
|
||||
@mock.patch('os.waitpid')
|
||||
def test_arbiter_reap_clears_companion_manager_pid(mock_os_waitpid):
|
||||
mock_os_waitpid.side_effect = [(4242, 0), (0, 0)]
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.companion_manager_pid = 4242
|
||||
arbiter.reap_workers()
|
||||
assert arbiter.companion_manager_pid == 0
|
||||
|
||||
|
||||
@mock.patch('os.waitpid')
|
||||
def test_arbiter_reap_unexpected_manager_exit_backs_off(mock_os_waitpid):
|
||||
# An unexpected manager exit (no deliberate stop) is an error and arms the
|
||||
# crash backoff so the main loop does not respawn it immediately.
|
||||
mock_os_waitpid.side_effect = [(4242, 0), (0, 0)]
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.companion_manager_pid = 4242
|
||||
arbiter.log = mock.Mock()
|
||||
arbiter.reap_workers()
|
||||
assert arbiter.companion_manager_pid == 0
|
||||
assert arbiter._companion_manager_failures == 1
|
||||
assert arbiter._companion_manager_respawn_at > 0
|
||||
arbiter.log.error.assert_called_once()
|
||||
arbiter.log.info.assert_not_called()
|
||||
|
||||
|
||||
@mock.patch('os.waitpid')
|
||||
def test_arbiter_reap_deliberate_manager_exit_is_info(mock_os_waitpid):
|
||||
# A deliberate stop (stopping flag set) is an expected exit: logged as info
|
||||
# with no backoff, so a reload respawns without delay.
|
||||
mock_os_waitpid.side_effect = [(4242, 0), (0, 0)]
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.companion_manager_pid = 4242
|
||||
arbiter._companion_manager_stopping = True
|
||||
arbiter.log = mock.Mock()
|
||||
arbiter.reap_workers()
|
||||
assert arbiter.companion_manager_pid == 0
|
||||
assert arbiter._companion_manager_stopping is False
|
||||
assert arbiter._companion_manager_respawn_at == 0
|
||||
arbiter.log.info.assert_called_once()
|
||||
arbiter.log.error.assert_not_called()
|
||||
|
||||
|
||||
def test_arbiter_manage_companion_manager_waits_during_backoff():
|
||||
# While inside the crash-backoff window the manager is not respawned.
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.cfg.set("companion_workers", [{"name": "rq", "target": "pkg:run"}])
|
||||
arbiter._companion_manager_respawn_at = time.monotonic() + 60
|
||||
arbiter.spawn_companion_manager = mock.Mock()
|
||||
arbiter.manage_companion_manager()
|
||||
arbiter.spawn_companion_manager.assert_not_called()
|
||||
|
||||
|
||||
def test_stop_companion_manager_marks_expected_and_clears_backoff():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.companion_manager_pid = 4242
|
||||
arbiter._companion_manager_failures = 3
|
||||
arbiter._companion_manager_respawn_at = time.monotonic() + 60
|
||||
with mock.patch("os.kill"):
|
||||
arbiter.stop_companion_manager(signal.SIGTERM)
|
||||
assert arbiter._companion_manager_stopping is True
|
||||
assert arbiter._companion_manager_failures == 0
|
||||
assert arbiter._companion_manager_respawn_at == 0
|
||||
|
||||
|
||||
def test_stop_companion_manager_signals_running():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.companion_manager_pid = 4242
|
||||
with mock.patch("os.kill") as kill:
|
||||
arbiter.stop_companion_manager(signal.SIGTERM)
|
||||
kill.assert_called_once_with(4242, signal.SIGTERM)
|
||||
|
||||
|
||||
def test_stop_companion_manager_noop_when_not_running():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
with mock.patch("os.kill") as kill:
|
||||
arbiter.stop_companion_manager(signal.SIGTERM)
|
||||
kill.assert_not_called()
|
||||
|
||||
|
||||
def test_stop_companion_manager_clears_pid_when_already_gone():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.companion_manager_pid = 4242
|
||||
with mock.patch("os.kill", side_effect=OSError(errno.ESRCH, "no such process")):
|
||||
arbiter.stop_companion_manager(signal.SIGTERM)
|
||||
assert arbiter.companion_manager_pid == 0
|
||||
|
||||
|
||||
@mock.patch('os.waitpid')
|
||||
def test_worker_reap_unaffected_by_companion_manager(mock_os_waitpid):
|
||||
# A worker exit is still reaped normally while a companion manager runs;
|
||||
# the companion reap branch must not swallow worker exits.
|
||||
mock_os_waitpid.side_effect = [(42, 0), (0, 0)]
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.cfg.settings['child_exit'] = mock.Mock()
|
||||
arbiter.companion_manager_pid = 9999
|
||||
mock_worker = mock.Mock()
|
||||
arbiter.WORKERS = {42: mock_worker}
|
||||
arbiter.reap_workers()
|
||||
mock_worker.tmp.close.assert_called_with()
|
||||
arbiter.cfg.child_exit.assert_called_with(arbiter, mock_worker)
|
||||
assert arbiter.companion_manager_pid == 9999
|
||||
|
||||
|
||||
@mock.patch('os.fork', return_value=77)
|
||||
def test_spawn_worker_unaffected_by_companions(mock_os_fork):
|
||||
# With companions configured, an HTTP worker is still spawned and recorded
|
||||
# exactly as before; companion config does not touch the worker path.
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.cfg.set("companion_workers", [{"name": "rq", "target": "pkg:run"}])
|
||||
arbiter.pid = 1234
|
||||
arbiter.WORKERS = {} # instance dict, do not mutate the shared class attr
|
||||
mock_worker = mock.Mock()
|
||||
arbiter.worker_class = mock.Mock(return_value=mock_worker)
|
||||
pid = arbiter.spawn_worker()
|
||||
assert pid == 77
|
||||
assert arbiter.WORKERS[77] is mock_worker
|
||||
|
||||
|
||||
def test_close_gunicorn_fds_in_manager_child():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
listener = mock.Mock()
|
||||
worker = mock.Mock()
|
||||
arbiter.LISTENERS = [listener]
|
||||
arbiter.WORKERS = {1: worker}
|
||||
arbiter.PIPE = [7, 8]
|
||||
with mock.patch("os.close") as os_close:
|
||||
arbiter._close_gunicorn_fds()
|
||||
listener.close.assert_called_once_with()
|
||||
worker.tmp.close.assert_called_once_with()
|
||||
assert os_close.call_count == 2
|
||||
|
||||
|
||||
def test_reload_companion_manager_restarts_running():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.cfg.set("companion_workers", [{"name": "rq", "target": "pkg:run"}])
|
||||
arbiter.companion_manager_pid = 4242
|
||||
arbiter.stop_companion_manager = mock.Mock()
|
||||
arbiter.spawn_companion_manager = mock.Mock()
|
||||
arbiter.reload_companion_manager()
|
||||
arbiter.stop_companion_manager.assert_called_once_with(signal.SIGTERM)
|
||||
# pid still set (stop is mocked), so no respawn until the old one is reaped
|
||||
arbiter.spawn_companion_manager.assert_not_called()
|
||||
|
||||
|
||||
def test_reload_companion_manager_starts_when_none_running():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.cfg.set("companion_workers", [{"name": "rq", "target": "pkg:run"}])
|
||||
arbiter.stop_companion_manager = mock.Mock()
|
||||
arbiter.spawn_companion_manager = mock.Mock()
|
||||
arbiter.reload_companion_manager()
|
||||
arbiter.stop_companion_manager.assert_not_called()
|
||||
arbiter.spawn_companion_manager.assert_called_once_with()
|
||||
|
||||
|
||||
def test_reload_companion_manager_noop_when_config_unchanged():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.cfg.set("companion_workers", [{"name": "rq", "target": "pkg:run"}])
|
||||
# The running manager was spawned with this exact config.
|
||||
arbiter._companion_configs = build_companion_configs(arbiter.cfg)
|
||||
arbiter.companion_manager_pid = 4242
|
||||
arbiter.stop_companion_manager = mock.Mock()
|
||||
arbiter.spawn_companion_manager = mock.Mock()
|
||||
arbiter.reload_companion_manager()
|
||||
# A web reload with unchanged companion specs leaves the manager alone.
|
||||
arbiter.stop_companion_manager.assert_not_called()
|
||||
arbiter.spawn_companion_manager.assert_not_called()
|
||||
|
||||
|
||||
def test_reload_companion_manager_restarts_when_field_changed():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.cfg.set("companion_workers",
|
||||
[{"name": "rq", "target": "pkg:run", "startsecs": 1}])
|
||||
arbiter._companion_configs = build_companion_configs(arbiter.cfg)
|
||||
arbiter.companion_manager_pid = 4242
|
||||
# Same name, changed field -> different config_hash -> reload.
|
||||
arbiter.cfg.set("companion_workers",
|
||||
[{"name": "rq", "target": "pkg:run", "startsecs": 9}])
|
||||
arbiter.stop_companion_manager = mock.Mock()
|
||||
arbiter.spawn_companion_manager = mock.Mock()
|
||||
arbiter.reload_companion_manager()
|
||||
arbiter.stop_companion_manager.assert_called_once_with(signal.SIGTERM)
|
||||
|
||||
|
||||
@mock.patch('gunicorn.sock.close_sockets')
|
||||
def test_arbiter_stop_signals_companion_manager(close_sockets):
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.stop_companion_manager = mock.Mock()
|
||||
arbiter.stop()
|
||||
signals = [call.args[0] for call in arbiter.stop_companion_manager.call_args_list]
|
||||
assert signal.SIGTERM in signals
|
||||
assert signal.SIGKILL in signals
|
||||
|
||||
|
||||
def test_companion_manager_stop_timeout_uses_explicit():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.cfg.set("companion_manager_stop_timeout", 120)
|
||||
assert arbiter.companion_manager_stop_timeout() == 120
|
||||
|
||||
|
||||
def test_companion_manager_stop_timeout_derives_from_slowest():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
arbiter.cfg.set("companion_workers", [
|
||||
{"name": "rq", "target": "pkg:run", "stop_timeout": 300},
|
||||
{"name": "scheduler", "target": "pkg:sched", "stop_timeout": 30},
|
||||
])
|
||||
arbiter.cfg.set("companion_manager_shutdown_buffer", 10)
|
||||
assert arbiter.companion_manager_stop_timeout() == 310
|
||||
|
||||
|
||||
def test_companion_manager_stop_timeout_zero_without_companions():
|
||||
arbiter = gunicorn.arbiter.Arbiter(DummyApplication())
|
||||
assert arbiter.companion_manager_stop_timeout() == 0
|
||||
|
||||
|
||||
class PreloadedAppWithEnvSettings(DummyApplication):
|
||||
"""
|
||||
Simple application that makes use of the 'preload' feature to
|
||||
|
||||
129
tests/test_companion_config.py
Normal file
129
tests/test_companion_config.py
Normal file
@ -0,0 +1,129 @@
|
||||
#
|
||||
# This file is part of gunicorn released under the MIT license.
|
||||
# See the NOTICE for more information.
|
||||
|
||||
import pytest
|
||||
|
||||
from gunicorn.config import Config, validate_companion_workers
|
||||
from gunicorn.companion.config import CompanionConfig, build_companion_configs
|
||||
|
||||
|
||||
def make_config(workers, **overrides):
|
||||
cfg = Config()
|
||||
cfg.set("companion_workers", workers)
|
||||
for key, value in overrides.items():
|
||||
cfg.set(key, value)
|
||||
return cfg
|
||||
|
||||
|
||||
def test_build_applies_global_defaults():
|
||||
cfg = make_config(
|
||||
[{"name": "rq", "target": "pkg:run"}],
|
||||
companion_stop_signal="SIGINT",
|
||||
companion_startsecs=5)
|
||||
config, = build_companion_configs(cfg)
|
||||
assert config.name == "rq"
|
||||
assert config.target == "pkg:run"
|
||||
assert config.stop_signal == "SIGINT"
|
||||
assert config.startsecs == 5
|
||||
|
||||
|
||||
def test_build_per_spec_overrides_global():
|
||||
cfg = make_config(
|
||||
[{"name": "rq", "target": "pkg:run", "stop_signal": "SIGTERM"}],
|
||||
companion_stop_signal="SIGINT")
|
||||
config, = build_companion_configs(cfg)
|
||||
assert config.stop_signal == "SIGTERM"
|
||||
|
||||
|
||||
def test_build_applies_global_restart_delay():
|
||||
cfg = make_config(
|
||||
[{"name": "rq", "target": "pkg:run"}],
|
||||
companion_restart_delay=42)
|
||||
config, = build_companion_configs(cfg)
|
||||
assert config.restart_delay == 42
|
||||
|
||||
|
||||
def test_build_rejects_unknown_spec_key():
|
||||
with pytest.raises(ValueError):
|
||||
build_companion_configs(
|
||||
make_config([{"name": "rq", "target": "pkg:run", "stop_sigal": "X"}]))
|
||||
|
||||
|
||||
def test_build_rejects_unknown_stop_signal():
|
||||
with pytest.raises(ValueError):
|
||||
build_companion_configs(
|
||||
make_config([{"name": "rq", "target": "pkg:run",
|
||||
"stop_signal": "SIGTRM"}]))
|
||||
|
||||
|
||||
def test_build_rejects_unknown_global_stop_signal():
|
||||
with pytest.raises(ValueError):
|
||||
build_companion_configs(
|
||||
make_config([{"name": "rq", "target": "pkg:run"}],
|
||||
companion_stop_signal="NOPE"))
|
||||
|
||||
|
||||
def test_build_reads_companion_config_file(tmp_path):
|
||||
config_file = tmp_path / "companion.conf.py"
|
||||
config_file.write_text(
|
||||
'companion_workers = [{"name": "scheduler", "target": "app:run"}]\n'
|
||||
'companion_stop_signal = "SIGINT"\n')
|
||||
cfg = make_config([], companion_config_file=str(config_file))
|
||||
config, = build_companion_configs(cfg)
|
||||
assert config.name == "scheduler"
|
||||
assert config.stop_signal == "SIGINT"
|
||||
|
||||
|
||||
def test_build_empty_when_none_configured():
|
||||
assert build_companion_configs(make_config([])) == []
|
||||
|
||||
|
||||
def test_build_requires_name_and_target():
|
||||
with pytest.raises(ValueError):
|
||||
build_companion_configs(make_config([{"name": "rq"}]))
|
||||
with pytest.raises(ValueError):
|
||||
build_companion_configs(make_config([{"target": "pkg:run"}]))
|
||||
|
||||
|
||||
def test_validate_companion_workers_accepts_none_and_list():
|
||||
assert validate_companion_workers(None) == []
|
||||
workers = [{"name": "rq", "target": "pkg:run"}]
|
||||
assert validate_companion_workers(workers) == workers
|
||||
|
||||
|
||||
def test_validate_companion_workers_rejects_non_list():
|
||||
with pytest.raises(TypeError):
|
||||
validate_companion_workers("rq")
|
||||
|
||||
|
||||
def test_validate_companion_workers_rejects_non_dict_item():
|
||||
with pytest.raises(TypeError):
|
||||
validate_companion_workers(["rq"])
|
||||
|
||||
|
||||
def test_config_hash_stable_and_field_sensitive():
|
||||
base = CompanionConfig(name="rq", target="pkg:run")
|
||||
same = CompanionConfig(name="rq", target="pkg:run")
|
||||
changed = CompanionConfig(name="rq", target="pkg:run", stop_timeout=99)
|
||||
assert base.config_hash == same.config_hash
|
||||
assert base.config_hash != changed.config_hash
|
||||
|
||||
|
||||
def test_config_hash_ignores_restart_delay():
|
||||
# restart_delay does not affect the spawned process, so changing it must not
|
||||
# change the hash (and so must not trigger a restart on reread).
|
||||
base = CompanionConfig(name="rq", target="pkg:run", restart_delay=5)
|
||||
changed = CompanionConfig(name="rq", target="pkg:run", restart_delay=30)
|
||||
assert base.config_hash == changed.config_hash
|
||||
|
||||
|
||||
def test_config_hash_keys_callable_target_by_qualified_name():
|
||||
def run():
|
||||
pass
|
||||
|
||||
keyed = CompanionConfig._target_key(run)
|
||||
assert ":" in keyed and keyed.endswith("run")
|
||||
# A callable target hashes stably across CompanionConfig instances.
|
||||
assert (CompanionConfig(name="rq", target=run).config_hash
|
||||
== CompanionConfig(name="rq", target=run).config_hash)
|
||||
185
tests/test_companion_control.py
Normal file
185
tests/test_companion_control.py
Normal file
@ -0,0 +1,185 @@
|
||||
#
|
||||
# This file is part of gunicorn released under the MIT license.
|
||||
# See the NOTICE for more information.
|
||||
|
||||
import json
|
||||
from unittest import mock
|
||||
|
||||
import pytest
|
||||
|
||||
from gunicorn.companion.config import CompanionConfig
|
||||
from gunicorn.companion.control import (
|
||||
MAX_LINE_BYTES,
|
||||
CommandError,
|
||||
ControlServer,
|
||||
decode_command,
|
||||
encode_response,
|
||||
)
|
||||
from gunicorn.companion.manager import CompanionManager
|
||||
|
||||
|
||||
def make_manager(*names):
|
||||
configs = [CompanionConfig(name=name, target=lambda: None) for name in names]
|
||||
return CompanionManager(configs, log=mock.Mock())
|
||||
|
||||
|
||||
def server_for(manager):
|
||||
return ControlServer(dispatch=manager.handle_command, path="/tmp/x.sock")
|
||||
|
||||
|
||||
def test_decode_command_valid():
|
||||
assert decode_command('{"cmd": "status"}') == {"cmd": "status"}
|
||||
|
||||
|
||||
def test_decode_command_bad_json():
|
||||
with pytest.raises(CommandError):
|
||||
decode_command("{not json")
|
||||
|
||||
|
||||
def test_decode_command_not_object():
|
||||
with pytest.raises(CommandError):
|
||||
decode_command("[1, 2, 3]")
|
||||
|
||||
|
||||
def test_decode_command_missing_cmd():
|
||||
with pytest.raises(CommandError):
|
||||
decode_command('{"name": "rq"}')
|
||||
|
||||
|
||||
def test_encode_response_newline_terminated():
|
||||
response = encode_response({"ok": True})
|
||||
assert response.endswith(b"\n")
|
||||
assert json.loads(response) == {"ok": True}
|
||||
|
||||
|
||||
def test_handle_line_dispatches():
|
||||
server = ControlServer(
|
||||
dispatch=lambda command: {"ok": True, "echo": command["cmd"]},
|
||||
path="/tmp/x.sock")
|
||||
response = server.handle_line('{"cmd": "status"}')
|
||||
assert json.loads(response) == {"ok": True, "echo": "status"}
|
||||
|
||||
|
||||
def test_handle_line_bad_json_error_envelope():
|
||||
server = ControlServer(dispatch=lambda command: {"ok": True}, path="/tmp/x.sock")
|
||||
response = json.loads(server.handle_line("garbage"))
|
||||
assert response["ok"] is False and "JSON" in response["error"]
|
||||
|
||||
|
||||
def test_handle_line_dispatch_command_error():
|
||||
def dispatch(command):
|
||||
raise CommandError("unknown command")
|
||||
server = ControlServer(dispatch=dispatch, path="/tmp/x.sock")
|
||||
response = json.loads(server.handle_line('{"cmd": "bogus"}'))
|
||||
assert response["ok"] is False and response["error"] == "unknown command"
|
||||
|
||||
|
||||
def test_handle_line_unexpected_exception_caught():
|
||||
def dispatch(command):
|
||||
raise ValueError("unknown stop signal 'SIGTRM'")
|
||||
server = ControlServer(dispatch=dispatch, path="/tmp/x.sock",
|
||||
log=mock.Mock())
|
||||
response = json.loads(server.handle_line('{"cmd": "stop", "name": "rq"}'))
|
||||
assert response["ok"] is False and "internal error" in response["error"]
|
||||
|
||||
|
||||
def test_create_unlinks_stale_and_chmods():
|
||||
server = ControlServer(dispatch=lambda command: {}, path="/tmp/x.sock",
|
||||
mode=0o600)
|
||||
listener = mock.Mock()
|
||||
with mock.patch("os.path.exists", return_value=True), \
|
||||
mock.patch("os.unlink") as unlink, \
|
||||
mock.patch("socket.socket", return_value=listener), \
|
||||
mock.patch("os.chmod") as chmod:
|
||||
server.create()
|
||||
unlink.assert_called_once_with("/tmp/x.sock")
|
||||
listener.bind.assert_called_once_with("/tmp/x.sock")
|
||||
chmod.assert_called_once_with("/tmp/x.sock", 0o600)
|
||||
listener.listen.assert_called_once()
|
||||
|
||||
|
||||
def test_close_unlinks():
|
||||
server = ControlServer(dispatch=lambda command: {}, path="/tmp/x.sock")
|
||||
server.listener = mock.Mock()
|
||||
with mock.patch("os.path.exists", return_value=True), \
|
||||
mock.patch("os.unlink") as unlink:
|
||||
server.close()
|
||||
unlink.assert_called_once_with("/tmp/x.sock")
|
||||
assert server.listener is None
|
||||
|
||||
|
||||
def test_control_status_command_end_to_end():
|
||||
manager = make_manager("rq")
|
||||
response = json.loads(server_for(manager).handle_line('{"cmd": "status"}'))
|
||||
assert response["ok"] is True
|
||||
assert response["companions"][0]["name"] == "rq"
|
||||
|
||||
|
||||
def test_control_start_command_end_to_end():
|
||||
manager = make_manager("rq")
|
||||
with mock.patch("os.fork", return_value=10):
|
||||
response = json.loads(
|
||||
server_for(manager).handle_line('{"cmd": "start", "name": "rq"}'))
|
||||
assert response["ok"] is True
|
||||
assert "rq" in response["message"]
|
||||
|
||||
|
||||
def test_control_unknown_command_error_envelope():
|
||||
manager = make_manager("rq")
|
||||
response = json.loads(
|
||||
server_for(manager).handle_line('{"cmd": "bogus", "name": "rq"}'))
|
||||
assert response["ok"] is False
|
||||
assert "unknown" in response["error"]
|
||||
|
||||
|
||||
def test_control_missing_name_error_envelope():
|
||||
manager = make_manager("rq")
|
||||
response = json.loads(server_for(manager).handle_line('{"cmd": "start"}'))
|
||||
assert response["ok"] is False
|
||||
assert "name" in response["error"]
|
||||
|
||||
|
||||
def test_control_reread_without_loader_error_envelope():
|
||||
manager = make_manager("rq")
|
||||
response = json.loads(server_for(manager).handle_line('{"cmd": "reread"}'))
|
||||
assert response["ok"] is False
|
||||
assert "reread" in response["error"]
|
||||
|
||||
|
||||
class FakeConnection:
|
||||
"""Minimal connection stand-in for serve_connection: yields preset chunks."""
|
||||
|
||||
def __init__(self, chunks):
|
||||
self._chunks = list(chunks)
|
||||
self.sent = []
|
||||
|
||||
def __enter__(self):
|
||||
return self
|
||||
|
||||
def __exit__(self, *exc):
|
||||
return False
|
||||
|
||||
def recv(self, _size):
|
||||
return self._chunks.pop(0) if self._chunks else b""
|
||||
|
||||
def sendall(self, data):
|
||||
self.sent.append(data)
|
||||
|
||||
|
||||
def test_serve_connection_drops_oversized_line():
|
||||
log = mock.Mock()
|
||||
server = ControlServer(dispatch=lambda command: {"ok": True},
|
||||
path="/tmp/x.sock", log=log)
|
||||
# A flood with no newline: the connection is dropped, nothing dispatched.
|
||||
connection = FakeConnection([b"x" * (MAX_LINE_BYTES + 1)])
|
||||
server.serve_connection(connection)
|
||||
assert connection.sent == []
|
||||
log.warning.assert_called_once()
|
||||
|
||||
|
||||
def test_serve_connection_answers_complete_line():
|
||||
manager = make_manager("rq")
|
||||
connection = FakeConnection([b'{"cmd": "status"}\n'])
|
||||
server_for(manager).serve_connection(connection)
|
||||
assert len(connection.sent) == 1
|
||||
assert json.loads(connection.sent[0])["ok"] is True
|
||||
61
tests/test_companion_ctl.py
Normal file
61
tests/test_companion_ctl.py
Normal file
@ -0,0 +1,61 @@
|
||||
#
|
||||
# This file is part of gunicorn released under the MIT license.
|
||||
# See the NOTICE for more information.
|
||||
|
||||
from unittest import mock
|
||||
|
||||
import pytest
|
||||
|
||||
from gunicorn.companion import ctl
|
||||
|
||||
|
||||
def test_run_status_prints_and_returns_zero(capsys):
|
||||
with mock.patch.object(
|
||||
ctl, "send_command", return_value={"ok": True, "companions": []}
|
||||
) as send:
|
||||
code = ctl.run(["--socket", "/tmp/x.sock", "status"])
|
||||
assert code == 0
|
||||
send.assert_called_once_with("/tmp/x.sock", {"cmd": "status"})
|
||||
assert "ok" in capsys.readouterr().out
|
||||
|
||||
|
||||
def test_run_per_name_command_sends_name():
|
||||
with mock.patch.object(
|
||||
ctl, "send_command", return_value={"ok": True, "message": "x"}
|
||||
) as send:
|
||||
code = ctl.run(["--socket", "/tmp/x.sock", "stop", "ticker"])
|
||||
assert code == 0
|
||||
send.assert_called_once_with("/tmp/x.sock", {"cmd": "stop", "name": "ticker"})
|
||||
|
||||
|
||||
def test_run_failure_response_returns_one():
|
||||
with mock.patch.object(
|
||||
ctl, "send_command", return_value={"ok": False, "error": "bad"}
|
||||
):
|
||||
assert ctl.run(["--socket", "/tmp/x.sock", "status"]) == 1
|
||||
|
||||
|
||||
def test_run_per_name_command_requires_name():
|
||||
with pytest.raises(SystemExit):
|
||||
ctl.run(["--socket", "/tmp/x.sock", "stop"])
|
||||
|
||||
|
||||
def test_run_requires_socket(monkeypatch):
|
||||
monkeypatch.delenv("GUNICORN_COMPANION_SOCKET", raising=False)
|
||||
with pytest.raises(SystemExit):
|
||||
ctl.run(["status"])
|
||||
|
||||
|
||||
def test_run_unreachable_socket_returns_two():
|
||||
with mock.patch.object(ctl, "send_command", side_effect=OSError("nope")):
|
||||
assert ctl.run(["--socket", "/tmp/x.sock", "status"]) == 2
|
||||
|
||||
|
||||
def test_send_command_round_trip():
|
||||
client = mock.Mock()
|
||||
client.recv.side_effect = [b'{"ok": true}\n']
|
||||
with mock.patch("socket.socket", return_value=client):
|
||||
result = ctl.send_command("/tmp/x.sock", {"cmd": "status"})
|
||||
client.connect.assert_called_once_with("/tmp/x.sock")
|
||||
assert client.sendall.call_args.args[0] == b'{"cmd": "status"}\n'
|
||||
assert result == {"ok": True}
|
||||
698
tests/test_companion_manager.py
Normal file
698
tests/test_companion_manager.py
Normal file
@ -0,0 +1,698 @@
|
||||
#
|
||||
# This file is part of gunicorn released under the MIT license.
|
||||
# See the NOTICE for more information.
|
||||
|
||||
import os
|
||||
import signal
|
||||
from unittest import mock
|
||||
|
||||
import pytest
|
||||
|
||||
from gunicorn.companion.control import CommandError
|
||||
from gunicorn.companion.manager import (
|
||||
CompanionManager,
|
||||
reset_child_signals,
|
||||
set_parent_death_signal,
|
||||
)
|
||||
from gunicorn.companion.config import CompanionConfig
|
||||
from gunicorn.companion.process import State
|
||||
|
||||
|
||||
def make_manager(*names):
|
||||
configs = [CompanionConfig(name=n, target=lambda: None) for n in names]
|
||||
return CompanionManager(configs, log=mock.Mock())
|
||||
|
||||
|
||||
def test_manager_builds_one_process_per_config():
|
||||
manager = make_manager("rq", "scheduler")
|
||||
assert set(manager.processes) == {"rq", "scheduler"}
|
||||
assert manager.processes["rq"].state == State.STOPPED
|
||||
|
||||
|
||||
def test_resolve_target_accepts_callable():
|
||||
fn = lambda: None
|
||||
assert CompanionManager._resolve_target(fn) is fn
|
||||
|
||||
|
||||
def test_resolve_target_import_string():
|
||||
# os.getpid is a real "module:attr" target.
|
||||
assert CompanionManager._resolve_target("os:getpid") is __import__("os").getpid
|
||||
|
||||
|
||||
def test_resolve_target_rejects_bad_string():
|
||||
with pytest.raises(ValueError):
|
||||
CompanionManager._resolve_target("no_colon")
|
||||
|
||||
|
||||
def test_log_exit_reports_signal_or_status():
|
||||
manager = make_manager("rq")
|
||||
process = manager.processes["rq"]
|
||||
process.last_exit_signal, process.last_exit_code = 9, None
|
||||
manager._log_exit(process)
|
||||
process.last_exit_signal, process.last_exit_code = None, 1
|
||||
manager._log_exit(process)
|
||||
messages = [call.args[0] for call in manager.log.info.call_args_list]
|
||||
assert any("signal" in message for message in messages)
|
||||
assert any("status" in message for message in messages)
|
||||
|
||||
|
||||
def test_reset_child_signals_restores_defaults():
|
||||
with mock.patch("signal.signal") as signal_signal:
|
||||
reset_child_signals()
|
||||
restored = {call.args[0]: call.args[1] for call in signal_signal.call_args_list}
|
||||
# The stop signal must reach the default disposition, not the manager's
|
||||
# inherited handler, so a forked companion actually terminates on SIGTERM.
|
||||
assert restored[signal.SIGTERM] is signal.SIG_DFL
|
||||
assert restored[signal.SIGINT] is signal.SIG_DFL
|
||||
|
||||
|
||||
def test_set_parent_death_signal_noop_off_linux():
|
||||
with mock.patch("sys.platform", "darwin"):
|
||||
assert set_parent_death_signal(signal.SIGTERM) is False
|
||||
|
||||
|
||||
def test_set_parent_death_signal_arms_on_linux():
|
||||
libc = mock.Mock()
|
||||
libc.prctl.return_value = 0
|
||||
with mock.patch("sys.platform", "linux"), \
|
||||
mock.patch("ctypes.CDLL", return_value=libc):
|
||||
assert set_parent_death_signal(signal.SIGTERM) is True
|
||||
libc.prctl.assert_called_once()
|
||||
|
||||
|
||||
def test_parent_gone_detects_reparenting():
|
||||
manager = make_manager("rq")
|
||||
manager.parent_pid = 4242
|
||||
with mock.patch("os.getppid", return_value=4242):
|
||||
assert manager._parent_gone() is False
|
||||
with mock.patch("os.getppid", return_value=1):
|
||||
assert manager._parent_gone() is True
|
||||
|
||||
|
||||
def test_close_manager_fds_closes_control_and_pipe():
|
||||
manager = make_manager("rq")
|
||||
manager.control = mock.Mock()
|
||||
manager._wakeup_pipe = (7, 8)
|
||||
with mock.patch("os.close") as os_close:
|
||||
manager._close_manager_fds()
|
||||
manager.control.listener.close.assert_called_once_with()
|
||||
assert os_close.call_count == 2
|
||||
|
||||
|
||||
def test_close_manager_fds_noop_when_unset():
|
||||
manager = make_manager("rq")
|
||||
manager._close_manager_fds() # control and pipe are None: must not raise
|
||||
|
||||
|
||||
def test_apply_environment_sets_cwd_and_env():
|
||||
config = CompanionConfig(name="rq", target=lambda: None,
|
||||
cwd="/tmp", env={"COMPANION_X": "1"})
|
||||
with mock.patch("os.chdir") as chdir, \
|
||||
mock.patch.dict("os.environ", {}, clear=False):
|
||||
CompanionManager._apply_environment(config)
|
||||
chdir.assert_called_once_with("/tmp")
|
||||
import os
|
||||
assert os.environ["COMPANION_X"] == "1"
|
||||
|
||||
|
||||
def test_apply_environment_noop_without_cwd_env():
|
||||
config = CompanionConfig(name="rq", target=lambda: None)
|
||||
with mock.patch("os.chdir") as chdir:
|
||||
CompanionManager._apply_environment(config)
|
||||
chdir.assert_not_called()
|
||||
|
||||
|
||||
def test_open_output_inherit_returns_none():
|
||||
assert CompanionManager._open_output(None) is None
|
||||
assert CompanionManager._open_output("inherit") is None
|
||||
|
||||
|
||||
def test_open_output_path_opens_append():
|
||||
with mock.patch("os.open", return_value=9) as open_mock:
|
||||
fd = CompanionManager._open_output("/var/log/rq.log")
|
||||
assert fd == 9
|
||||
flags = open_mock.call_args.args[1]
|
||||
assert flags & os.O_APPEND and flags & os.O_CREAT
|
||||
|
||||
|
||||
def test_redirect_output_files():
|
||||
config = CompanionConfig(name="rq", target=lambda: None,
|
||||
stdout="/o.log", stderr="/e.log")
|
||||
with mock.patch("os.open", side_effect=[10, 11]), \
|
||||
mock.patch("os.dup2") as dup2, \
|
||||
mock.patch("os.close") as close:
|
||||
CompanionManager._redirect_output(config)
|
||||
dup2.assert_any_call(10, 1)
|
||||
dup2.assert_any_call(11, 2)
|
||||
# The opened fds are closed after being duped onto 1/2, no leak.
|
||||
close.assert_any_call(10)
|
||||
close.assert_any_call(11)
|
||||
|
||||
|
||||
def test_redirect_output_stderr_to_stdout():
|
||||
config = CompanionConfig(name="rq", target=lambda: None,
|
||||
stdout="/o.log", stderr="stdout")
|
||||
with mock.patch("os.open", return_value=10), \
|
||||
mock.patch("os.dup2") as dup2, \
|
||||
mock.patch("os.close") as close:
|
||||
CompanionManager._redirect_output(config)
|
||||
dup2.assert_any_call(10, 1)
|
||||
dup2.assert_any_call(1, 2)
|
||||
close.assert_called_once_with(10)
|
||||
|
||||
|
||||
def test_redirect_output_inherit_noop():
|
||||
config = CompanionConfig(name="rq", target=lambda: None)
|
||||
with mock.patch("os.dup2") as dup2:
|
||||
CompanionManager._redirect_output(config)
|
||||
dup2.assert_not_called()
|
||||
|
||||
|
||||
def test_reap_records_exit_code():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.pid = 4321
|
||||
# exit code 1 -> status 1<<8; second call drains the queue.
|
||||
with mock.patch("os.waitpid", side_effect=[(4321, 1 << 8), (0, 0)]):
|
||||
reaped = manager.reap_processes()
|
||||
assert reaped == [proc]
|
||||
assert proc.last_exit_code == 1
|
||||
assert proc.last_exit_signal is None
|
||||
assert proc.exit_count == 1
|
||||
assert proc.pid is None
|
||||
|
||||
|
||||
def test_reap_records_signal():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.pid = 4321
|
||||
with mock.patch("os.waitpid", side_effect=[(4321, 9), (0, 0)]):
|
||||
manager.reap_processes()
|
||||
assert proc.last_exit_signal == 9
|
||||
assert proc.last_exit_code is None
|
||||
|
||||
|
||||
def test_reap_no_children():
|
||||
manager = make_manager("rq")
|
||||
with mock.patch("os.waitpid", side_effect=ChildProcessError):
|
||||
assert manager.reap_processes() == []
|
||||
|
||||
|
||||
def test_status_lists_all_companions():
|
||||
manager = make_manager("rq", "scheduler")
|
||||
entries = manager.status(now=100.0)
|
||||
assert {e["name"] for e in entries} == {"rq", "scheduler"}
|
||||
assert all("state" in e and "description" in e for e in entries)
|
||||
|
||||
|
||||
def test_handle_command_status():
|
||||
manager = make_manager("rq")
|
||||
resp = manager.handle_command({"cmd": "status"})
|
||||
assert resp["ok"] is True
|
||||
assert resp["companions"][0]["name"] == "rq"
|
||||
|
||||
|
||||
def test_handle_command_start_routes():
|
||||
manager = make_manager("rq")
|
||||
with mock.patch.object(manager, "start_process",
|
||||
return_value=(True, "rq started")) as start_mock:
|
||||
resp = manager.handle_command({"cmd": "start", "name": "rq"})
|
||||
start_mock.assert_called_once_with("rq")
|
||||
assert resp == {"ok": True, "message": "rq started"}
|
||||
|
||||
|
||||
def test_handle_command_stop_and_restart_route():
|
||||
manager = make_manager("rq")
|
||||
with mock.patch.object(manager, "stop_process", return_value=(True, "s")) as stop_mock, \
|
||||
mock.patch.object(manager, "restart_process", return_value=(True, "r")) as restart_mock:
|
||||
manager.handle_command({"cmd": "stop", "name": "rq"})
|
||||
manager.handle_command({"cmd": "restart", "name": "rq"})
|
||||
stop_mock.assert_called_once_with("rq")
|
||||
restart_mock.assert_called_once_with("rq")
|
||||
|
||||
|
||||
def test_handle_command_missing_name():
|
||||
manager = make_manager("rq")
|
||||
with pytest.raises(CommandError):
|
||||
manager.handle_command({"cmd": "start"})
|
||||
|
||||
|
||||
def test_handle_command_unknown():
|
||||
manager = make_manager("rq")
|
||||
with pytest.raises(CommandError):
|
||||
manager.handle_command({"cmd": "reread"})
|
||||
|
||||
|
||||
def make_config(name, **kwargs):
|
||||
return CompanionConfig(name=name, target=lambda: None, **kwargs)
|
||||
|
||||
|
||||
def test_reread_adds_new():
|
||||
manager = make_manager("rq")
|
||||
new = [make_config("rq"), make_config("scheduler")]
|
||||
with mock.patch("os.fork", return_value=10):
|
||||
result = manager.reread_config(new)
|
||||
assert result["added"] == ["scheduler"]
|
||||
assert "scheduler" in manager.processes
|
||||
assert manager.processes["scheduler"].state == State.STARTING
|
||||
|
||||
|
||||
def test_reread_removes_missing():
|
||||
manager = make_manager("rq", "scheduler")
|
||||
manager.processes["scheduler"].state = State.RUNNING
|
||||
manager.processes["scheduler"].pid = 11
|
||||
with mock.patch("os.kill"):
|
||||
result = manager.reread_config([make_config("rq")])
|
||||
assert result["removed"] == ["scheduler"]
|
||||
assert "scheduler" not in manager.processes
|
||||
|
||||
|
||||
def test_reread_restarts_changed():
|
||||
manager = make_manager("rq")
|
||||
manager.processes["rq"].state = State.RUNNING
|
||||
manager.processes["rq"].pid = 12
|
||||
changed = make_config("rq", env={"X": "1"}) # different hash
|
||||
with mock.patch("os.kill"):
|
||||
result = manager.reread_config([changed])
|
||||
assert result["restarted"] == ["rq"]
|
||||
assert manager.processes["rq"].config is changed
|
||||
assert manager.processes["rq"].state == State.STOPPING
|
||||
|
||||
|
||||
def test_reread_changed_manual_stop_keeps_stopped():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.manual_stop = True
|
||||
proc.state = State.STOPPED
|
||||
changed = make_config("rq", env={"X": "1"})
|
||||
result = manager.reread_config([changed])
|
||||
assert result["unchanged"] == ["rq"]
|
||||
assert proc.config is changed and proc.state == State.STOPPED
|
||||
|
||||
|
||||
def test_reread_unchanged_noop():
|
||||
manager = make_manager("rq")
|
||||
same = manager.processes["rq"].config
|
||||
result = manager.reread_config([same])
|
||||
assert result["unchanged"] == ["rq"]
|
||||
assert result["restarted"] == []
|
||||
|
||||
|
||||
def test_reread_duplicate_name_keeps_old():
|
||||
manager = make_manager("rq")
|
||||
result = manager.reread_config([make_config("rq"), make_config("rq")])
|
||||
assert result["ok"] is False and result["kept_old_config"] is True
|
||||
assert "duplicate" in result["error"]
|
||||
|
||||
|
||||
def test_reread_validation_failure_mutates_nothing():
|
||||
manager = make_manager("rq")
|
||||
original_config = manager.processes["rq"].config
|
||||
original_names = set(manager.processes)
|
||||
# This batch would change "rq" and add "scheduler", but the duplicate
|
||||
# "scheduler" makes the whole reread invalid: nothing must be applied.
|
||||
bad = [make_config("rq", env={"X": "1"}), make_config("scheduler"),
|
||||
make_config("scheduler")]
|
||||
with mock.patch("os.fork") as fork, mock.patch("os.kill") as kill:
|
||||
result = manager.reread_config(bad)
|
||||
assert result["ok"] is False and result["kept_old_config"] is True
|
||||
assert set(manager.processes) == original_names
|
||||
assert manager.processes["rq"].config is original_config
|
||||
fork.assert_not_called()
|
||||
kill.assert_not_called()
|
||||
|
||||
|
||||
def test_handle_command_reread_no_loader():
|
||||
manager = make_manager("rq")
|
||||
with pytest.raises(CommandError):
|
||||
manager.handle_command({"cmd": "reread"})
|
||||
|
||||
|
||||
def test_handle_command_reread_runs_loader():
|
||||
manager = make_manager("rq")
|
||||
manager.config_loader = lambda: [manager.processes["rq"].config]
|
||||
resp = manager.handle_command({"cmd": "reread"})
|
||||
assert resp["ok"] is True and resp["unchanged"] == ["rq"]
|
||||
|
||||
|
||||
def test_handle_command_reread_bad_config():
|
||||
manager = make_manager("rq")
|
||||
def boom():
|
||||
raise ValueError("duplicate companion name rq")
|
||||
manager.config_loader = boom
|
||||
resp = manager.handle_command({"cmd": "reread"})
|
||||
assert resp["ok"] is False and resp["kept_old_config"] is True
|
||||
|
||||
|
||||
def test_start_process_stopped_spawns():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
with mock.patch("os.fork", return_value=70) as fork:
|
||||
ok, _ = manager.start_process("rq")
|
||||
fork.assert_called_once()
|
||||
assert ok and proc.state == State.STARTING and proc.manual_stop is False
|
||||
|
||||
|
||||
def test_start_process_backoff_cancels_retry():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.BACKOFF
|
||||
proc.next_retry_at = 999.0
|
||||
proc.manual_stop = True
|
||||
with mock.patch("os.fork", return_value=71):
|
||||
ok, _ = manager.start_process("rq")
|
||||
assert ok and proc.state == State.STARTING
|
||||
assert proc.next_retry_at is None and proc.manual_stop is False
|
||||
|
||||
|
||||
def test_start_process_running_is_noop():
|
||||
manager = make_manager("rq")
|
||||
manager.processes["rq"].state = State.RUNNING
|
||||
with mock.patch("os.fork") as fork:
|
||||
ok, _ = manager.start_process("rq")
|
||||
assert ok
|
||||
fork.assert_not_called()
|
||||
|
||||
|
||||
def test_start_process_stopping_rejected():
|
||||
manager = make_manager("rq")
|
||||
manager.processes["rq"].state = State.STOPPING
|
||||
ok, msg = manager.start_process("rq")
|
||||
assert not ok and "stopping" in msg
|
||||
|
||||
|
||||
def test_start_process_unknown():
|
||||
manager = make_manager("rq")
|
||||
ok, _ = manager.start_process("nope")
|
||||
assert not ok
|
||||
|
||||
|
||||
def test_stop_process_running_signals_and_stopping():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.RUNNING
|
||||
proc.pid = 80
|
||||
proc.config.stop_timeout = 60
|
||||
with mock.patch("os.kill") as kill:
|
||||
ok, _ = manager.stop_process("rq", now=200.0)
|
||||
kill.assert_called_once_with(80, signal.SIGTERM)
|
||||
assert ok and proc.state == State.STOPPING
|
||||
assert proc.manual_stop is True and proc.stop_deadline == 260.0
|
||||
|
||||
|
||||
def test_stop_process_backoff_to_stopped():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.BACKOFF
|
||||
proc.next_retry_at = 999.0
|
||||
with mock.patch("os.kill") as kill:
|
||||
ok, _ = manager.stop_process("rq")
|
||||
kill.assert_not_called()
|
||||
assert ok and proc.state == State.STOPPED
|
||||
assert proc.next_retry_at is None and proc.manual_stop is True
|
||||
|
||||
|
||||
def test_stop_process_already_stopped():
|
||||
manager = make_manager("rq")
|
||||
with mock.patch("os.kill") as kill:
|
||||
ok, _ = manager.stop_process("rq")
|
||||
kill.assert_not_called()
|
||||
assert ok and manager.processes["rq"].manual_stop is True
|
||||
|
||||
|
||||
def test_stop_process_unknown():
|
||||
manager = make_manager("rq")
|
||||
ok, _ = manager.stop_process("nope")
|
||||
assert not ok
|
||||
|
||||
|
||||
def test_stop_during_restart_cancels_pending_restart():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.STOPPING
|
||||
proc.pid = 70
|
||||
proc.restart_pending = True
|
||||
with mock.patch("os.kill") as kill:
|
||||
ok, _ = manager.stop_process("rq")
|
||||
kill.assert_not_called()
|
||||
assert ok and proc.manual_stop is True and proc.restart_pending is False
|
||||
# On exit the companion now settles STOPPED instead of being respawned.
|
||||
with mock.patch.object(manager, "spawn_process") as spawn:
|
||||
manager.handle_exit(proc)
|
||||
spawn.assert_not_called()
|
||||
assert proc.state == State.STOPPED
|
||||
|
||||
|
||||
def test_safe_kill_ignores_dead_process():
|
||||
with mock.patch("os.kill", side_effect=ProcessLookupError):
|
||||
CompanionManager._safe_kill(123, signal.SIGTERM) # must not raise
|
||||
|
||||
|
||||
def test_stop_process_survives_dead_companion():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.RUNNING
|
||||
proc.pid = 80
|
||||
with mock.patch("os.kill", side_effect=ProcessLookupError):
|
||||
ok, _ = manager.stop_process("rq", now=1.0)
|
||||
assert ok and proc.state == State.STOPPING
|
||||
|
||||
|
||||
def test_signal_number_resolves_name():
|
||||
assert CompanionManager._signal_number("SIGKILL") == signal.SIGKILL
|
||||
assert CompanionManager._signal_number(9) == 9
|
||||
|
||||
|
||||
def test_signal_number_rejects_bad():
|
||||
with pytest.raises(ValueError):
|
||||
CompanionManager._signal_number("SIGTRM")
|
||||
|
||||
|
||||
def test_restart_process_running_stops_with_reload_timeout():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.RUNNING
|
||||
proc.pid = 90
|
||||
proc.config.reload_timeout = 30
|
||||
proc.manual_stop = True
|
||||
with mock.patch("os.kill") as kill:
|
||||
ok, _ = manager.restart_process("rq", now=300.0)
|
||||
kill.assert_called_once_with(90, signal.SIGTERM)
|
||||
assert ok and proc.state == State.STOPPING
|
||||
assert proc.restart_pending is True and proc.stop_deadline == 330.0
|
||||
assert proc.manual_stop is False
|
||||
|
||||
|
||||
def test_restart_pending_reap_respawns_immediately():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.STOPPING
|
||||
proc.restart_pending = True
|
||||
proc.pid = 91
|
||||
with mock.patch("os.waitpid", side_effect=[(91, 0), (0, 0)]), \
|
||||
mock.patch("os.fork", return_value=92):
|
||||
manager.reap_processes()
|
||||
assert proc.state == State.STARTING
|
||||
assert proc.pid == 92
|
||||
assert proc.restart_pending is False
|
||||
assert proc.restart_count == 1
|
||||
|
||||
|
||||
def test_restart_process_stopped_starts_now():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
with mock.patch("os.fork", return_value=93), mock.patch("os.kill") as kill:
|
||||
ok, _ = manager.restart_process("rq")
|
||||
kill.assert_not_called()
|
||||
assert ok and proc.state == State.STARTING
|
||||
|
||||
|
||||
def test_restart_process_backoff_starts_now():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.BACKOFF
|
||||
proc.next_retry_at = 999.0
|
||||
with mock.patch("os.fork", return_value=94):
|
||||
ok, _ = manager.restart_process("rq")
|
||||
assert ok and proc.state == State.STARTING and proc.next_retry_at is None
|
||||
|
||||
|
||||
def test_restart_process_stopping_rejected():
|
||||
manager = make_manager("rq")
|
||||
manager.processes["rq"].state = State.STOPPING
|
||||
ok, msg = manager.restart_process("rq")
|
||||
assert not ok and "stopping" in msg
|
||||
|
||||
|
||||
def test_manual_stop_preserved_through_exit():
|
||||
# stop a running companion, then reap its child: it must settle in STOPPED
|
||||
# with manual_stop still set so it is not auto-restarted.
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.RUNNING
|
||||
proc.pid = 60
|
||||
with mock.patch("os.kill"):
|
||||
manager.stop_process("rq", now=10.0)
|
||||
with mock.patch("os.waitpid", side_effect=[(60, 0), (0, 0)]), \
|
||||
mock.patch("os.fork") as fork:
|
||||
manager.reap_processes()
|
||||
fork.assert_not_called()
|
||||
assert proc.state == State.STOPPED and proc.manual_stop is True
|
||||
|
||||
|
||||
def test_start_clears_manual_stop():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.manual_stop = True
|
||||
with mock.patch("os.fork", return_value=61):
|
||||
manager.start_process("rq")
|
||||
assert proc.manual_stop is False
|
||||
|
||||
|
||||
def test_spawn_does_not_touch_manual_stop():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.manual_stop = True
|
||||
with mock.patch("os.fork", return_value=62):
|
||||
manager.spawn_process(proc)
|
||||
assert proc.manual_stop is True
|
||||
|
||||
|
||||
def test_handle_exit_unexpected_backoff():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.restart_delay = 5
|
||||
manager.handle_exit(proc, now=100.0)
|
||||
assert proc.state == State.BACKOFF
|
||||
assert proc.next_retry_at == 105.0
|
||||
|
||||
|
||||
def test_handle_exit_manual_stop_stays_stopped():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.manual_stop = True
|
||||
manager.handle_exit(proc, now=100.0)
|
||||
assert proc.state == State.STOPPED
|
||||
assert proc.next_retry_at is None
|
||||
|
||||
|
||||
def test_retry_backoff_respawns_when_due():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.BACKOFF
|
||||
proc.next_retry_at = 100.0
|
||||
with mock.patch("os.fork", return_value=555):
|
||||
retried = manager.retry_backoff(now=101.0)
|
||||
assert retried == [proc]
|
||||
assert proc.restart_count == 1
|
||||
assert proc.state == State.STARTING
|
||||
assert proc.pid == 555
|
||||
|
||||
|
||||
def test_retry_backoff_waits_until_due():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.BACKOFF
|
||||
proc.next_retry_at = 100.0
|
||||
assert manager.retry_backoff(now=99.0) == []
|
||||
assert proc.state == State.BACKOFF
|
||||
|
||||
|
||||
def test_reap_unexpected_exit_enters_backoff():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.pid = 4321
|
||||
with mock.patch("os.waitpid", side_effect=[(4321, 1 << 8), (0, 0)]):
|
||||
manager.reap_processes()
|
||||
assert proc.state == State.BACKOFF
|
||||
assert proc.next_retry_at is not None
|
||||
|
||||
|
||||
def test_promote_running_after_startsecs():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.config.startsecs = 1
|
||||
proc.state = State.STARTING
|
||||
proc.started_at = 100.0
|
||||
promoted = manager.promote_running(now=101.5)
|
||||
assert promoted == [proc]
|
||||
assert proc.state == State.RUNNING
|
||||
|
||||
|
||||
def test_promote_running_too_early():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.config.startsecs = 5
|
||||
proc.state = State.STARTING
|
||||
proc.started_at = 100.0
|
||||
assert manager.promote_running(now=102.0) == []
|
||||
assert proc.state == State.STARTING
|
||||
|
||||
|
||||
def test_promote_running_ignores_non_starting():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
proc.state = State.BACKOFF
|
||||
proc.started_at = 100.0
|
||||
assert manager.promote_running(now=999.0) == []
|
||||
assert proc.state == State.BACKOFF
|
||||
|
||||
|
||||
def test_spawn_parent_records_pid_and_starting():
|
||||
manager = make_manager("rq")
|
||||
proc = manager.processes["rq"]
|
||||
with mock.patch("os.fork", return_value=4321):
|
||||
pid = manager.spawn_process(proc)
|
||||
assert pid == 4321
|
||||
assert proc.pid == 4321
|
||||
assert proc.state == State.STARTING
|
||||
assert proc.started_at is not None
|
||||
assert proc.manual_stop is False
|
||||
|
||||
|
||||
def test_lifecycle_running_crash_backoff_retry():
|
||||
manager = make_manager("rq")
|
||||
process = manager.processes["rq"]
|
||||
assert process.state == State.STOPPED
|
||||
with mock.patch("os.fork", return_value=100):
|
||||
manager.spawn_process(process)
|
||||
assert process.state == State.STARTING
|
||||
manager.promote_running(now=process.started_at + process.config.startsecs)
|
||||
assert process.state == State.RUNNING
|
||||
manager.handle_exit(process, now=1000.0)
|
||||
assert process.state == State.BACKOFF
|
||||
assert process.next_retry_at == 1000.0 + process.restart_delay
|
||||
with mock.patch("os.fork", return_value=101):
|
||||
manager.retry_backoff(now=process.next_retry_at)
|
||||
assert process.state == State.STARTING
|
||||
|
||||
|
||||
def test_lifecycle_stop_to_stopped():
|
||||
manager = make_manager("rq")
|
||||
process = manager.processes["rq"]
|
||||
with mock.patch("os.fork", return_value=200):
|
||||
manager.spawn_process(process)
|
||||
manager.promote_running(now=process.started_at + process.config.startsecs)
|
||||
with mock.patch("os.kill") as kill:
|
||||
manager.stop_process("rq", now=500.0)
|
||||
assert process.state == State.STOPPING
|
||||
assert process.manual_stop is True
|
||||
kill.assert_called_once()
|
||||
manager.handle_exit(process, now=501.0)
|
||||
assert process.state == State.STOPPED
|
||||
|
||||
|
||||
def test_lifecycle_restart_respawns_after_exit():
|
||||
manager = make_manager("rq")
|
||||
process = manager.processes["rq"]
|
||||
with mock.patch("os.fork", return_value=300):
|
||||
manager.spawn_process(process)
|
||||
manager.promote_running(now=process.started_at + process.config.startsecs)
|
||||
with mock.patch("os.kill"):
|
||||
manager.restart_process("rq", now=600.0)
|
||||
assert process.state == State.STOPPING
|
||||
assert process.restart_pending is True
|
||||
with mock.patch("os.fork", return_value=301):
|
||||
manager.handle_exit(process, now=601.0)
|
||||
assert process.state == State.STARTING
|
||||
assert process.restart_pending is False
|
||||
Loading…
x
Reference in New Issue
Block a user