# Autonomous Planner — TODO Mining, Intelligent Queue, Parallel Execution

**Date**: 2026-02-24
**Priority**: High — closes the loop between intent (TODO.md) and execution (autonomous engine)
**Status**: Design approved, implementation pending
**Approach**: A with guardrails (LLM-based classification, conservative defaults)

## Problem

The autonomous engine (`supervisor/autonomous/engine.py`) can fork sessions, monitor them, pause on user return, and track completion. But the queue (`supervisor/todos/queue.yaml`) is hand-populated. There's a gap between the 37 TODO.md files scattered across rivus — which contain the actual intent — and the queue entries that the engine can execute.

Today's workflow:
```
Human writes TODO.md → Human reads TODO.md → Human writes queue.yaml → Engine executes
```

Target workflow:
```
Human writes TODO.md → Planner scans → Planner classifies/prioritizes → Queue populates → Engine executes
```

The planner also needs to handle what the engine currently can't:
- **Parallel execution** — run independent tasks concurrently
- **Smart pause policies** — pause long research on user return, but let a 2-minute doc fix finish
- **Dependency resolution** — don't start "run baseline evals" before "consolidate benchmarks"
- **UUID-style IDs** — replace sequential `sup-001` with collision-free IDs

## Current Infrastructure

| Component | Status | Used by planner? |
|-----------|--------|-----------------|
| Queue system (`todo.py`) | Built | Yes — output target |
| Engine loop (`engine.py`) | Built | Yes — executes planned items |
| Scanner (`scanner.py`) | Built | Yes — issue discovery feeds planner |
| Policy config (`autonomous.yaml`) | Built | Yes — constraints on execution |
| Docker sandbox (`sandbox_replay.py`) | Built | Pattern reused for `code_change` isolation |
| Dashboard (`autonomous_view.py`) | Built | Extended with action buttons |
| `it2 fork claude` | Built | Session spawning |
| `kick` command | Built | Manual override |

## Design

### 1. TODO.md Mining

**Input**: All `**/TODO.md` files in rivus (~37 today).

**Process**:
1. Glob for TODO.md files, read each
2. Parse into item list (handle checkboxes, bullets, headings, prose)
3. Fingerprint each item (hash of normalized text) for change detection
4. Store in `~/.coord/planner/todo_index.yaml` — maps fingerprint → {file, line, text, last_seen, queue_id}

**Deduplication**: Same item appearing in multiple TODO.md files (cross-references) gets one queue entry. The fingerprint catches reformulations; exact-match is the baseline, fuzzy-match is a stretch goal.

**Staleness**: Items not seen on rescan get flagged (not deleted) — the TODO.md may have been cleaned up but the queue item may still be in progress.

### 2. LLM Classification (the "A with guardrails" approach)

Each mined item gets classified by a single LLM call (haiku for cost, batch of 10-20 items per call):

```yaml
# LLM output per item
tier: safe_always | code_change        # Does it change code or just read/report?
type: research | doc_refresh | code_scrutiny | failure_analysis | feature
effort: small | medium | large         # small=<15min, medium=15-60min, large=60min+
risk: low | medium | high              # What could go wrong?
project: supervisor | vario | learning | ... # Which project owns this?
dependencies: [fingerprint_or_id, ...]  # What must finish first?
pause_policy: finish | pause | checkpoint  # On user return: finish (short), pause (long), checkpoint (save state + pause)
parallel_group: string | null           # Items in same group can run concurrently
priority: 1-5                           # Urgency (1=highest)
```

**Conservative defaults** (guardrails):
- Unknown tier → `code_change` (more restrictive)
- Unknown risk → `medium` (requires approval)
- Unknown effort → `medium` (30 min timeout)
- `auto_start: false` for anything LLM marks as `risk: medium` or `risk: high`
- `large` effort items get flagged for human review, never auto-started

**Validation**: LLM output is schema-validated. Any field that fails validation falls back to conservative default. No silent acceptance of bad data.

### 3. Queue Generation

Classified items become queue entries. The planner:

1. Checks existing queue — don't duplicate items already present (match by fingerprint or title similarity)
2. Creates new entries with UUID-style IDs: `plan-{8-char-hex}` (e.g., `plan-a7f3b2c1`)
3. Sets `source: planner` field to distinguish from hand-written entries
4. Preserves hand-written entries — planner never modifies `sup-*` or manually-created items
5. Writes batch to queue.yaml atomically (existing file-lock mechanism)

**ID format change**: New planner-created items use `plan-{hex8}`. Existing `sup-*` items are untouched. The `kick` command and engine work with any string ID.

### 4. Dependency Resolution

The `prerequisites` field already exists in Todo. The planner populates it:

```yaml
- id: plan-a7f3b2c1
  title: "Run baseline evals for vario"
  prerequisites: [plan-b8e4c3d2]   # "Consolidate vario benchmarks"
```

**Selection enhancement** in `select_next_todo()`:
```python
def is_eligible(self) -> bool:
    if self.status != "pending":
        return False
    # Check prerequisites are all done
    if self.prerequisites:
        queue = load_queue()
        prereq_done = all(
            any(t.id == pid and t.status == "done" for t in queue)
            for pid in self.prerequisites
        )
        if not prereq_done:
            return False
    # ... existing tier/risk checks
```

**Circular dependency detection**: Before writing queue, planner builds a DAG and checks for cycles. Any cycle → flag for human review, don't enqueue.

### 5. Parallel Execution

Currently `max_concurrent_tasks: 1`. The planner enables safe parallelism:

**Policy change** in `autonomous.yaml`:
```yaml
max_concurrent_tasks: 3              # Up from 1
parallel_rules:
  - same_project: false              # Don't run 2 tasks in same project simultaneously
  - same_files: false                # Don't run tasks that touch overlapping file sets
  - safe_always_unlimited: true      # Read-only tasks don't count toward limit
```

**Engine changes**: `check_autonomous()` already iterates `active_tasks` and checks capacity. Enhancement:
- Before starting a new task, check it doesn't conflict with running tasks (same project, overlapping paths)
- `safe_always` tasks bypass the project-conflict check (they're read-only)

### 6. Smart Pause Policies

Currently: all non-manual tasks pause when `idle_minutes() < 2`. The planner assigns per-task pause policies:

| Policy | Behavior on user return | When to use |
|--------|------------------------|-------------|
| `finish` | Let it complete (already short) | effort=small, <5 min remaining |
| `pause` | Pause immediately | effort=medium/large, code_change |
| `checkpoint` | Save progress, then pause | Long research tasks with intermediate output |

**Engine change** in `_monitor_task()`:
```python
if not task.get("manual"):
    policy = task.get("pause_policy", "pause")
    elapsed = ...
    max_min = task.get("max_minutes", 30)
    remaining = max_min - elapsed

    if policy == "finish" and remaining < 5:
        pass  # Let it finish
    elif policy == "checkpoint":
        # Signal worker to save checkpoint (write to report dir), then pause
        _signal_checkpoint(task)
        _remove_active_task(state, todo_id)
    else:
        # Default: pause immediately
        update_status(todo_id, "paused")
        ...
```

### 7. Docker Isolation for code_change

Reuse the `sandbox_replay.py` pattern for code-changing autonomous tasks:

```
code_change task
  → git stash / clean worktree
  → docker run with mounted repo (or cloned at HEAD)
  → Claude runs inside container
  → On completion: extract git diff from container
  → Create PR branch with the diff
  → Flag for human review
```

**Not in v1**: Docker isolation is Phase 2. Phase 1 uses `it2 fork` with project path constraints from `autonomous.yaml` as the safety boundary.

### 8. Dashboard Actions

Add interactive controls to `autonomous_view.py`:

**New API endpoints** (in supervisor's watch server):
- `POST /autonomous/kick/<id>` — start a task immediately (calls existing `kick` logic)
- `POST /autonomous/reorder` — `{id, new_priority}` — change priority
- `POST /autonomous/cancel/<id>` — cancel a running or pending task
- `POST /autonomous/attach/<id>` — open the worker's iTerm2 tab (`it2 fork goto`)

**UI additions**:
- "Kick" button per pending queue item
- Priority up/down arrows (or drag handle)
- "Cancel" button for active/pending items
- "Attach" link for active items (opens iTerm2 session)
- Filter/sort controls (by project, tier, status)

### 9. CLI: `sup` shortcut

Create `~/.local/bin/sup`:
```bash
#!/bin/bash
python -m supervisor.cli auto "$@"
```

Commands become: `sup queue`, `sup kick <id>`, `sup status`, `sup scan`, etc.

## Data Model Changes

### Todo dataclass additions
```python
@dataclass
class Todo:
    # ... existing fields ...
    # New planner fields
    source: str = "manual"              # manual | planner | scanner
    fingerprint: str | None = None      # SHA256 of normalized text
    source_file: str | None = None      # Which TODO.md this came from
    pause_policy: str = "pause"         # finish | pause | checkpoint
    parallel_group: str | None = None   # Items in same group can run concurrently
    effort: str = "medium"              # small | medium | large
```

### Planner index (`~/.coord/planner/todo_index.yaml`)
```yaml
last_scan: "2026-02-24T10:00:00-08:00"
items:
  - fingerprint: "a7f3b2c1..."
    file: "vario/TODO.md"
    line: 15
    text: "Run baseline evals against benchmark suite"
    first_seen: "2026-02-24T10:00:00-08:00"
    last_seen: "2026-02-24T10:00:00-08:00"
    queue_id: "plan-a7f3b2c1"        # null if not yet enqueued
    status: enqueued | skipped | stale
```

## Implementation Phases

### Phase 1: Core Planner (v1)
1. TODO.md scanner + fingerprint index
2. LLM classification (haiku, batched)
3. Queue generation with `plan-{hex8}` IDs
4. Dependency resolution in `select_next_todo()`
5. `sup` CLI shortcut
6. `sup plan` command — run planner manually, show what would be enqueued

### Phase 2: Execution Enhancements
1. Smart pause policies (finish/pause/checkpoint)
2. Parallel execution (max_concurrent=3, conflict detection)
3. Dashboard action buttons (kick, reorder, cancel, attach)

### Phase 3: Isolation & Autonomy
1. Docker isolation for code_change tasks
2. Auto-PR creation on code_change completion
3. Planner runs on schedule (recurring safe_always todo) — daily rescan
4. Feedback loop: completed task outcomes improve classification

## Safety Considerations

- **Planner never auto-enqueues `risk: high`** — always flags for review
- **`large` effort items** are enqueued but `auto_start: false`
- **Scanner output** (issues) is separate from planner output (TODO items) — both feed the queue but with different `source` values
- **Hand-written queue entries** (`sup-*`) are never modified by the planner
- **LLM classification failures** fall back to conservative defaults, never to permissive ones
- **Parallel execution** respects project isolation — no two code_change tasks in the same project
- **Phase 1 has no Docker** — path constraints in `autonomous.yaml` are the safety boundary

## Resolved Questions

1. **Planner trigger**: Both. `sup plan` for manual runs + a recurring `safe_always` todo (`plan-rescan`) that runs daily on idle. The planner is its own first customer.

## Open Questions

1. **Large items**: Split into sub-items automatically, or just flag for human decomposition?
2. **Feedback loop**: How should task outcomes (success/fail/timeout) feed back to improve future classifications?
