A typo'd companion_stop_signal (e.g. "SIGTRM") passed validate_string
but raised ValueError in _signal_number when the manager later tried to
send it -- propagating past handle_line and killing the run loop.
Validate stop_signal at config-build time so a bad value fails loudly
on load and reread. As defense-in-depth, catch unexpected exceptions in
ControlServer.handle_line so no handler bug can escape and kill the
manager; they now return an error envelope.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three companion settings were documented and configurable but never had any
effect. companion_restart_delay was ignored because CompanionProcess hardcoded
a 5s delay; it is now read from config and kept out of config_hash, since it
does not affect the spawned process and so must not trigger a restart on
reread. companion_config_file was never read; the manager now loads its
companion settings from that dedicated file when set, instead of always reading
the main gunicorn config. companion_manager_stop_timeout was unused, so
shutdown waited only graceful_timeout before SIGKILLing the manager and cut
short long-draining companions; stop now waits the larger of graceful_timeout
and the manager stop timeout, derived from the slowest companion stop_timeout
plus the buffer when not set explicitly.
Worker specs now reject unknown keys so a typo fails loudly instead of silently
falling back to a default. Also correct the spawn_companion_manager docstring,
drop its unused return value, and fix the README config-file description.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cover validate_companion_workers (None becomes empty, non-list and non-dict
items rejected) and CompanionConfig.config_hash (stable for equal configs,
changes when a field changes, callable target keyed by qualified name and
hashed stably).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Run the companion manager as a single arbiter child with its own
supervision loop, and host the config model with its loader.
config.py holds CompanionConfig (moved from process.py) and
build_companion_configs(cfg), which expands each companion_workers entry into
a CompanionConfig, filling omitted fields from the global companion_* settings.
It is also the reread config_loader. process.py keeps State and CompanionProcess.
CompanionManager.run() is the forked-child body: installs SIGCHLD/SIGTERM/SIGINT
via a self-pipe, brings up the control socket, starts every companion, then
select-waits on the socket and the pipe. Each tick reaps exits, retries backoff,
promotes past startsecs, and SIGKILLs companions past their stop deadline.
SIGTERM/SIGINT stop all companions and return.
Arbiter gains companion_manager_pid, manage_companion_manager (respawns the
manager when it is gone and companions are configured), spawn_companion_manager
(fork; child runs the loop), and reap detection that clears the pid on exit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>