Unified protocol for Claude Code to work autonomously — decisions logged, user notified, reviews tracked.
Claude Code sessions often receive big tasks — "refactor this module", "add tests for that system", "audit these imports." The user fires it off and walks away. But today there's no standard for:
Whether triggered by /autonomous, a trailing &, or the supervisor queue — the worker session needs the same protocol.
/autonomous "task"Each autonomous work item gets a review file at ~/.claude/autodo/{id}/review.md with YAML frontmatter (machine-parseable metadata) and a markdown body (human-readable decisions, flags, outcomes).
| Trigger | How | Protocol |
|---|---|---|
/autonomous "task" | User invokes skill | Same review file format, same prompt, same notifications |
do thing & | Trailing & convention | |
| Supervisor queue | queue.yaml → engine.py |
Style, naming, file organization, which library for a simple task
Architecture choices, API signatures, splitting/merging files, adding dependencies
Shared code deletion, public API changes, anything irreversible or external-facing
Genuinely stuck — unclear requirements, conflicting constraints, missing info
---
id: auto-20260219-143022
title: "Refactor brain/engine resolve.py for async consistency"
started: 2026-02-19T14:30:22-0800
ended: 2026-02-19T15:12:45-0800
duration_minutes: 42
status: needs_review
launched_by: session-a1b2c3
launched_from: pane
project: rivus
risk_summary:
low: 3
medium: 1
high: 0
blocked: 0
decisions_count: 4
flags_count: 1
---
## Task
Refactor brain/engine/resolve.py — make all data resolution
paths consistently async, remove sync wrappers.
## Decisions Made
- [low] Kept httpx.AsyncClient session-scoped — matches existing pattern
- [low] Used asyncio.gather() for parallel URL resolution
- [medium] Changed resolve() signature from sync to async
## Flagged for Review
- [medium] Removed sync resolve_data() wrapper entirely —
verify no external callers
## Files Changed
- brain/engine/resolve.py (major)
- brain/vario/ui_gen.py (caller update)
- brain/engine/tests/test_resolve.py (new tests)
On completion (or block), three channels fire — tiered by severity:
| Channel | Completed | Blocked |
|---|---|---|
| Badge (glanceable) | REVIEW: {title} ✅ |
BLOCKED: {title} ⚠️ |
| TODO.md (persistent) | Entry under ## Autonomous Review Queue |
[BLOCKED] entry with the question |
| macOS / Pushover (urgent) | notify("complete", level="info") |
notify("blocked", level="warning") |
$ /autonomous list
auto-20260219-143022 needs_review 42min 3L 1M 0H Refactor resolve.py
sup-007 blocked 12min 0L 0M 1H Audit lib/llm imports
auto-20260218-091500 reviewed 28min 2L 0M 0H Add tests for billing
$ /autonomous review auto-20260219-143022
ID: auto-20260219-143022
Title: Refactor resolve.py
Status: needs_review
Duration: 42min
Decisions:
[low] Kept httpx.AsyncClient session-scoped
[medium] Changed resolve() signature from sync to async
Flagged for Review:
[medium] Removed sync resolve_data() wrapper — verify no external callers
$ /autonomous approve-low
Reviewed: auto-20260218-091500 — Add tests for billing
1 item(s) approved
supervisor/autonomous/ review.py — ReviewFile model + CRUD (22 tests) prompts.py — + build_autonomous_prompt() (7 tests) completion.py — notify_review_needed + TODO.md append (4 tests) cli.py — + list-reviews, review, approve-low (7 tests) engine.py — wired to create/update review files (1 test) todo.py — existing queue model (unchanged) notify.py — existing badge notifications (unchanged) ~/.claude/ skills/autonomous/ SKILL.md — /autonomous skill definition autonomous/reviews/ {id}/ review.md — per work-item review file
docs/plans//autonomous skill — SKILL.md created/autonomous on a real task, inspect review.md output, verify notifications fireNormal usage IS the test. When output passes through human review as part of its normal operation, that review IS the test suite. No separate formal testing needed. The review step is the feedback loop.
Every human correction should feed back to tighten future behavior. Reviews aren't just checkpoints — they're training data for the next run. The system should converge toward the user's actual judgment over time.
Built 2026-02-19 · rivus/supervisor/autonomous · 41 tests passing