Architecture Diagram
High-level runtime topology of an Animus install. This diagram is the fastest way to orient a new operator or plugin author. For the full narrative see docs/architecture/full-system-architecture.md; for the crate-level dependency graph see docs/architecture/index.md; for the inbound control surface in detail see docs/architecture/control-protocol.md; for the outbound plugin contracts see docs/architecture/subject-backend-plugins.md and the companion *-plugins.md series.
Runtime topology
How to read it
- Blue nodes are operators or the CLI surface they invoke.
- Charcoal-with-blue-border nodes are inside the long-lived
animus daemonprocess. - Charcoal-with-green-border nodes are plugin processes the daemon spawns and supervises.
- Charcoal-with-amber-border nodes are external systems plugins talk to.
- Dark nodes are on-disk state.
Arrows point in the direction of the call. JSON-RPC over stdio is the contract for every plugin edge. The Unix socket is the contract for every operator-facing edge into the daemon.
Design rationale
Why plugin-first
Animus deletes its own bundled providers, subject backends, and web stack as of v0.4.12. The reasoning is:
- Vendoring rots. A bundled claude integration drifts behind the upstream CLI's release cadence. A plugin pinned to a specific upstream version is the customer's choice and the plugin author's responsibility.
- One contract. When the in-tree
taskbackend, the in-tree Linear adapter, and a third-party Jira plugin all satisfySubjectBackend, the daemon stops needing per-backend code paths. - Replaceable parts. A bug in
animus-provider-claudedoesn't require a daemon release. Pin to the previous plugin tag, fix forward at your own cadence.
Why stdio
JSON-RPC over stdin/stdout is the lowest-common-denominator inter-process contract:
- No port allocation, no auth, no TLS, no service discovery.
- Works on every platform Animus targets without conditionals.
- Plugin authors can write in any language that can read a line and write a line.
- The daemon supervises the process directly (signals, exit codes, stderr capture, restart budget).
Why Unix socket for control
The operator-facing inbound surface uses a Unix domain socket at ~/.animus/<scope>/control.sock:
- No port to claim. Two projects on the same machine don't collide.
- Filesystem-scoped auth. Unix permissions on the socket file decide who can connect — no per-call token negotiation.
- Cheap connect. CLI invocations connect, send one request, exit. The daemon doesn't pay TLS handshake costs per command.
Transport plugins (animus-transport-http, animus-transport-graphql) bridge to network-addressable transports when they're needed. The core daemon never opens a port itself.
Why streaming notifications
Three flows are inherently streaming:
agent/runtoken output (TextDelta,Thinking,ToolCall,ToolResult).WorkflowEventfan-out to the web UI (workflow started, phase completed, decision needed).subject/changedandtrigger/eventpush from external systems.
JSON-RPC notifications (id-less frames) carry all three. The plugin host's reader task routes responses to per-call oneshot::Senders and notifications to a broadcast::Sender so any number of subscribers fan out from one process. Channel capacity defaults to 256; plugins that emit bursts declare notification_buffer_size in their manifest. See docs/architecture/plugin-host-concurrency.md for the full contract.
How durability works
- Phase sessions persist as
<phase>.session.jsonunder~/.animus/<scope>/runs/<run-id>/. On daemon restart the scheduler attemptsprovider.resume_agentagainst the originating plugin. - Idempotency annotations on workflow phases gate auto-resume.
idempotentretries silently;sideeffectingandunknownblock with an actionable hint so the operator decides. See the v0.4.12 CHANGELOG entry anddocs/migration/v0.4.11-to-v0.4.12.md. - Event log ships to
events.jsonlunder the scoped state root via the in-treeorchestrator-loggingfallback, or to whicheverlog_storage_backendplugin is installed. - Plugin restart budgets (5 attempts under exponential backoff) cap the blast radius of a flaky plugin without taking the daemon down with it.
Known gaps
- Subprocess workflow runner events. Workflow events emitted by workflow runners spawned as subprocesses are not yet plumbed through the long-lived plugin host to control subscribers. Single-process workflows stream fine; the subprocess path is tracked for v0.5. See
docs/architecture/plugin-host-concurrency.md. - Provider supervisor. The trigger supervisor's exponential-backoff restart loop has not yet been generalized to providers. Provider crashes return
ConnectionLostto the caller today; the long-lived provider host + supervisor migration is also v0.5. agent/cancelrouting. Cancel today spawns a fresh plugin process that doesn't know about the live session. The fix lands with the long-lived host migration; until then cancel against a long-running stream is best-effort.
Related docs
- Crate Map — workspace-level layout
- Control Protocol — inbound RPC contract
- Plugin Host Concurrency — reader/router model + cancel
- Subject Backend Plugins — outbound subject contract
- Plugin Signing — cosign policy
- Naming Contract —
animus.*everywhere - Plugin Author Guide — write your own
- Operator Runbook — run it in production