← Back to Rivus overview

Autodo

An autonomous work engine that finds issues, builds features, and improves the codebase — all while you're away. Not CI. Not a linter. A teammate that works your idle hours.

What It Is (and Isn't)

Autodo is proactive. It doesn't wait for a push or a PR — it looks at the whole codebase, decides what's worth doing, queues it by priority, and executes in forked Claude sessions with full LLM reasoning.

Work items span the full spectrum: fixing broken links, yes — but also writing new test suites, implementing small features, researching design options, and producing structured reports.

How It's Different from CI

CI / CDAutodo
TriggerCode push / PRUser idle time
Question"Did this change break anything?""What should improve next?"
ScopeThe diffThe entire codebase
OutputPass / fail gateReports, commits, filed work items
AgencyReactive — runs a fixed scriptProactive — chooses what to work on
ReasoningDeterministic checksLLM judgment + multi-model review
CatchesRegressions in changed codeRot, drift, missing tests, opportunity
PostureGatekeeperContributor

CI answers "is this PR safe to merge?" Autodo answers "what should I build, fix, or investigate while you're at lunch?" They're complementary — CI gates changes, autodo generates them.

Three Kinds of Work

Not just bug fixes. The queue holds three distinct categories:

Fix

Repair & Maintain

Broken links, convention violations, error handling gaps, stale docs, dead code. The stuff that accumulates when you're focused on features.

Feat

Build & Extend

Add a missing test suite. Implement a small feature from a TODO. Write a CLI command. Create a health endpoint. Things that move the project forward.

Research

Investigate & Report

Evaluate a library. Audit a subsystem. Compare approaches. Produce a structured report with findings, tables, and recommendations.

Where Work Comes From

The queue is fed by multiple sources, expanding over time:

Scanner

9 automated checks (ripgrep, no LLM) run periodically. Broken links, conventions, error swallowing, dead code, logging gaps. Cheap and idempotent.

TODO Mining

Planner reads every TODO.md, deduplicates, classifies via fast LLM. Each item gets tier, risk, effort, priority. Existing items auto-resolve when checked off.

Manual Filing

User or another session files a work item directly via autodo run or the jobs DB. Bypasses the scanner — for ad-hoc tasks or one-off investigations.

Periodic Tests

Run test suites on a schedule to catch rot: dependency breakage, time-sensitive tests, environment drift, flaky tests. Things CI misses because CI only runs on push.

Code Review Findings

Planned. After reviewing a PR or branch, file follow-up items: edge cases to test, docs to update, patterns to extract. The review generates work, not just approval.

Feature Brainstorming

Planned. Product ideation sessions produce concrete tasks: prototypes to build, alternatives to evaluate, user flows to test. Ideas go straight to the queue with priority and scope.

The Pipeline

1
Source
2
Triage
3
Queue
4
Execute
5
Review

1. Sourcing

Scanner — Ripgrep-based static checks run in-process. No LLM, no fork. Returns structured ScanIssue objects. Adding a new check is ~30 lines of Python plus a YAML entry.

Planner — Mines TODO.md files for unchecked items. Deduplicates by text fingerprint, classifies in batches via a fast LLM. Each item gets: tier, risk, effort, priority 1–5.

Test runner — Executes test suites periodically. Failures are filed as work items with the traceback and affected module. Catches environmental rot that no code change triggered.

2. Triage & Suppression

Not every finding should become a work item. scanner_allow.yaml lets you suppress findings with a documented reason — the autodo equivalent of # noqa. File-level or line-specific.

3. Queue

All items land in the jobs database as work_items under job_id=codebase_maintenance. Deduplication by deterministic key. Items that vanish from subsequent scans are auto-resolved.

4. Execution

The helm loop calls check_autonomous() every tick. If the user is idle, it selects the highest-priority eligible item and forks a session via it2 fork claude (falls back to tmux).

Each session gets a tailored prompt: safety rules, scope limits (file count, line count, time budget), project context, and type-specific guidance. Code-change workers use multi-model self-review.

5. Review

Workers log every decision with a severity tag: [low], [medium], [high], [blocked]. These flow into review files at ~/.claude/autodo/{id}/review.md. Low-risk-only reviews can be auto-approved; anything else requires human judgment.

Execution Tiers

Two trust levels determine what the forked session can do:

Safe Always

Read-Only Analysis

No file modifications. Produces structured reports: audits, research, evaluations. Always eligible for autonomous start. Output goes to ~/.claude/autodo/{id}/report.md.

Code Change

Modify on Branch

Creates branch auto/autodo/{date}/{id}, commits with [auto] prefix, tags checkpoints as autodo-checkpoint-N. Only auto-starts for low-risk items.

Safety Guardrails

Three mechanisms prevent runaway work:

User-return detection — If user idle time drops below 2 minutes, active workers are paused. Your keystrokes always take priority.

Timeout enforcement — Each item has a max_minutes budget (10–45 min depending on scope). Worker is marked complete when time expires.

Session liveness — Engine polls whether the forked session is still alive. Dead sessions are marked complete automatically.

Scanner Checks

CheckFindsFreq
broken_linksMarkdown links to non-existent pathsWeekly
port_mismatchesDoc port numbers that don't match CaddyfileWeekly
conventions8 rg patterns: bare except, Optional[], logging import, etc.Weekly
error_swallowingtry/except returning falsy sentinels or bare passMonthly
registry_consistencyService registry vs plist vs tasks.py cross-checkWeekly
logging_contextError logs missing structured kwargs (url=, key=)Weekly
dead_codeUnused imports and uncalled functionsMonthly
duplicationKnown copy-paste anti-patterns across filesMonthly
logging_thoroughnessFull-codebase logging audit (skips startup/debug)Weekly
test_healthTest suites that fail, flake, or haven't run recentlyDaily

Scale

10
Scanner Checks
6
Work Sources
~4,700
Lines of Code

Each scanner check is a function returning structured results. Adding a new source or check is the primary extension point — the queue, execution, and review machinery is shared.