4 Commits

Author SHA1 Message Date
Tanmoy Sarkar
3b972fe310 fix(companion): Validate stop_signal and harden control dispatch
A typo'd companion_stop_signal (e.g. "SIGTRM") passed validate_string
but raised ValueError in _signal_number when the manager later tried to
send it -- propagating past handle_line and killing the run loop.

Validate stop_signal at config-build time so a bad value fails loudly
on load and reread. As defense-in-depth, catch unexpected exceptions in
ControlServer.handle_line so no handler bug can escape and kill the
manager; they now return an error envelope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 23:25:26 +05:30
Tanmoy Sarkar
642387dd0e fix(companion): Apply dead config settings and validate specs
Three companion settings were documented and configurable but never had any
effect. companion_restart_delay was ignored because CompanionProcess hardcoded
a 5s delay; it is now read from config and kept out of config_hash, since it
does not affect the spawned process and so must not trigger a restart on
reread. companion_config_file was never read; the manager now loads its
companion settings from that dedicated file when set, instead of always reading
the main gunicorn config. companion_manager_stop_timeout was unused, so
shutdown waited only graceful_timeout before SIGKILLing the manager and cut
short long-draining companions; stop now waits the larger of graceful_timeout
and the manager stop timeout, derived from the slowest companion stop_timeout
plus the buffer when not set explicitly.

Worker specs now reject unknown keys so a typo fails loudly instead of silently
falling back to a default. Also correct the spawn_companion_manager docstring,
drop its unused return value, and fix the README config-file description.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-12 22:51:19 +05:30
Tanmoy Sarkar
e780484d24 test(companion): Add config validation tests
Cover validate_companion_workers (None becomes empty, non-list and non-dict
items rejected) and CompanionConfig.config_hash (stable for equal configs,
changes when a field changes, callable target keyed by qualified name and
hashed stably).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 23:03:43 +05:30
Tanmoy Sarkar
457bc5a69a feat(companion): Spawn and reap the manager from the arbiter
Run the companion manager as a single arbiter child with its own
supervision loop, and host the config model with its loader.

config.py holds CompanionConfig (moved from process.py) and
build_companion_configs(cfg), which expands each companion_workers entry into
a CompanionConfig, filling omitted fields from the global companion_* settings.
It is also the reread config_loader. process.py keeps State and CompanionProcess.

CompanionManager.run() is the forked-child body: installs SIGCHLD/SIGTERM/SIGINT
via a self-pipe, brings up the control socket, starts every companion, then
select-waits on the socket and the pipe. Each tick reaps exits, retries backoff,
promotes past startsecs, and SIGKILLs companions past their stop deadline.
SIGTERM/SIGINT stop all companions and return.

Arbiter gains companion_manager_pid, manage_companion_manager (respawns the
manager when it is gone and companions are configured), spawn_companion_manager
(fork; child runs the loop), and reap detection that clears the pid on exit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 22:24:53 +05:30