Autonomous Work Protocol

Unified protocol for Claude Code to work autonomously — decisions logged, user notified, reviews tracked.

Tests

Modules

Triggers

Risk Tiers

The Problem

Claude Code sessions often receive big tasks — "refactor this module", "add tests for that system", "audit these imports." The user fires it off and walks away. But today there's no standard for:

Where open questions and decisions live
What triggers a pause vs. a judgment call
How the user gets notified to review
What the review artifact looks like

Whether triggered by /autonomous, a trailing &, or the supervisor queue — the worker session needs the same protocol.

How It Works

/autonomous "task"

→

Create review.md

→

Fork session

→

Work + log decisions

→

Notify user

→

Review

Each autonomous work item gets a review file at ~/.claude/autodo/{id}/review.md with YAML frontmatter (machine-parseable metadata) and a markdown body (human-readable decisions, flags, outcomes).

Three Trigger Paths, One Protocol

Trigger	How	Protocol
`/autonomous "task"`	User invokes skill	Same review file format, same prompt, same notifications
`do thing &`	Trailing `&` convention
Supervisor queue	`queue.yaml` → engine.py

Decision Risk Tiers

low

Style, naming, file organization, which library for a simple task

Decide & log

medium

Architecture choices, API signatures, splitting/merging files, adding dependencies

Decide, log & flag for review

high

Shared code deletion, public API changes, anything irreversible or external-facing

STOP — pause & notify

blocked

Genuinely stuck — unclear requirements, conflicting constraints, missing info

STOP — pause & notify

Review File Format

---
id: auto-20260219-143022
title: "Refactor brain/engine resolve.py for async consistency"
started: 2026-02-19T14:30:22-0800
ended: 2026-02-19T15:12:45-0800
duration_minutes: 42
status: needs_review
launched_by: session-a1b2c3
launched_from: pane
project: rivus
risk_summary:
  low: 3
  medium: 1
  high: 0
  blocked: 0
decisions_count: 4
flags_count: 1
---

## Task
Refactor brain/engine/resolve.py — make all data resolution
paths consistently async, remove sync wrappers.

## Decisions Made
- [low] Kept httpx.AsyncClient session-scoped — matches existing pattern
- [low] Used asyncio.gather() for parallel URL resolution
- [medium] Changed resolve() signature from sync to async

## Flagged for Review
- [medium] Removed sync resolve_data() wrapper entirely —
  verify no external callers

## Files Changed
- brain/engine/resolve.py (major)
- brain/vario/ui_gen.py (caller update)
- brain/engine/tests/test_resolve.py (new tests)

Notification Channels

On completion (or block), three channels fire — tiered by severity:

Channel	Completed	Blocked
Badge (glanceable)	`REVIEW: {title} ✅`	`BLOCKED: {title} ⚠️`
TODO.md (persistent)	Entry under `## Autonomous Review Queue`	`[BLOCKED]` entry with the question
macOS / Pushover (urgent)	`notify("complete", level="info")`	`notify("blocked", level="warning")`

Review Workflow

List all reviews

$ /autonomous list

auto-20260219-143022  needs_review  42min  3L 1M 0H  Refactor resolve.py
sup-007               blocked       12min  0L 0M 1H  Audit lib/llm imports
auto-20260218-091500  reviewed      28min  2L 0M 0H  Add tests for billing

Review a specific item

$ /autonomous review auto-20260219-143022

ID:       auto-20260219-143022
Title:    Refactor resolve.py
Status:   needs_review
Duration: 42min

Decisions:
  [low] Kept httpx.AsyncClient session-scoped
  [medium] Changed resolve() signature from sync to async

Flagged for Review:
  [medium] Removed sync resolve_data() wrapper — verify no external callers

Bulk approve low-risk items

$ /autonomous approve-low

  Reviewed: auto-20260218-091500 — Add tests for billing
1 item(s) approved

Architecture

supervisor/autonomous/
  review.py      — ReviewFile model + CRUD (22 tests)
  prompts.py     — + build_autonomous_prompt() (7 tests)
  completion.py  — notify_review_needed + TODO.md append (4 tests)
  cli.py         — + list-reviews, review, approve-low (7 tests)
  engine.py      — wired to create/update review files (1 test)
  todo.py        — existing queue model (unchanged)
  notify.py      — existing badge notifications (unchanged)

~/.claude/
  skills/autonomous/
    SKILL.md   — /autonomous skill definition
  autonomous/reviews/
    {id}/
      review.md  — per work-item review file

Status & TODOs

✔ Design doc — approved, saved to docs/plans/

✔ ReviewFile model + CRUD — 22 tests

✔ Autonomous prompt template — 7 tests

✔ Completion notifications — 4 tests

✔ Review CLI (list, review, approve-low) — 7 tests

✔ Supervisor integration — 1 test

✔ /autonomous skill — SKILL.md created

○ Live smoke test — run /autonomous on a real task, inspect review.md output, verify notifications fire

○ Calibrate risk tiers — first 2-3 runs, compare review.md to what you'd want. Adjust prompt if needed

○ Future: chaining — complete one todo, auto-pick next

○ Future: review → calibration — user corrections feed back to tighten future decisions

Design Principles

Subsume Validation into Operation

Normal usage IS the test. When output passes through human review as part of its normal operation, that review IS the test suite. No separate formal testing needed. The review step is the feedback loop.

Treat Review as Calibration

Every human correction should feed back to tighten future behavior. Reviews aren't just checkpoints — they're training data for the next run. The system should converge toward the user's actual judgment over time.

Built 2026-02-19 · rivus/supervisor/autonomous · 41 tests passing