Autodo

An autonomous work engine that finds issues, builds features, and improves the codebase — all while you're away. Not CI. Not a linter. A teammate that works your idle hours.

What It Is (and Isn't)

Autodo is proactive. It doesn't wait for a push or a PR — it looks at the whole codebase, decides what's worth doing, queues it by priority, and executes in forked Claude sessions with full LLM reasoning.

Work items span the full spectrum: fixing broken links, yes — but also writing new test suites, implementing small features, researching design options, and producing structured reports.

How It's Different from CI

	CI / CD	Autodo
Trigger	Code push / PR	User idle time
Question	"Did this change break anything?"	"What should improve next?"
Scope	The diff	The entire codebase
Output	Pass / fail gate	Reports, commits, filed work items
Agency	Reactive — runs a fixed script	Proactive — chooses what to work on
Reasoning	Deterministic checks	LLM judgment + multi-model review
Catches	Regressions in changed code	Rot, drift, missing tests, opportunity
Posture	Gatekeeper	Contributor

CI answers "is this PR safe to merge?" Autodo answers "what should I build, fix, or investigate while you're at lunch?" They're complementary — CI gates changes, autodo generates them.

Three Kinds of Work

Not just bug fixes. The queue holds three distinct categories:

Fix

Repair & Maintain

Broken links, convention violations, error handling gaps, stale docs, dead code. The stuff that accumulates when you're focused on features.

Feat

Build & Extend

Add a missing test suite. Implement a small feature from a TODO. Write a CLI command. Create a health endpoint. Things that move the project forward.

Research

Investigate & Report

Evaluate a library. Audit a subsystem. Compare approaches. Produce a structured report with findings, tables, and recommendations.

Where Work Comes From

The queue is fed by multiple sources, expanding over time:

Scanner

9 automated checks (ripgrep, no LLM) run periodically. Broken links, conventions, error swallowing, dead code, logging gaps. Cheap and idempotent.

TODO Mining

Planner reads every TODO.md, deduplicates, classifies via fast LLM. Each item gets tier, risk, effort, priority. Existing items auto-resolve when checked off.

Manual Filing

User or another session files a work item directly via autodo run or the jobs DB. Bypasses the scanner — for ad-hoc tasks or one-off investigations.

Periodic Tests

Run test suites on a schedule to catch rot: dependency breakage, time-sensitive tests, environment drift, flaky tests. Things CI misses because CI only runs on push.

Code Review Findings

Planned. After reviewing a PR or branch, file follow-up items: edge cases to test, docs to update, patterns to extract. The review generates work, not just approval.

Feature Brainstorming

Planned. Product ideation sessions produce concrete tasks: prototypes to build, alternatives to evaluate, user flows to test. Ideas go straight to the queue with priority and scope.

The Pipeline

Source

→

Triage

→

Queue

→

Execute

→

Review

1. Sourcing

Scanner — Ripgrep-based static checks run in-process. No LLM, no fork. Returns structured ScanIssue objects. Adding a new check is ~30 lines of Python plus a YAML entry.

Planner — Mines TODO.md files for unchecked items. Deduplicates by text fingerprint, classifies in batches via a fast LLM. Each item gets: tier, risk, effort, priority 1–5.

Test runner — Executes test suites periodically. Failures are filed as work items with the traceback and affected module. Catches environmental rot that no code change triggered.

2. Triage & Suppression

Not every finding should become a work item. scanner_allow.yaml lets you suppress findings with a documented reason — the autodo equivalent of # noqa. File-level or line-specific.

3. Queue

All items land in the jobs database as work_items under job_id=codebase_maintenance. Deduplication by deterministic key. Items that vanish from subsequent scans are auto-resolved.

4. Execution

The helm loop calls check_autonomous() every tick. If the user is idle, it selects the highest-priority eligible item and forks a session via it2 fork claude (falls back to tmux).

Each session gets a tailored prompt: safety rules, scope limits (file count, line count, time budget), project context, and type-specific guidance. Code-change workers use multi-model self-review.

5. Review

Workers log every decision with a severity tag: [low], [medium], [high], [blocked]. These flow into review files at ~/.claude/autodo/{id}/review.md. Low-risk-only reviews can be auto-approved; anything else requires human judgment.

Execution Tiers

Two trust levels determine what the forked session can do:

Safe Always

Read-Only Analysis

No file modifications. Produces structured reports: audits, research, evaluations. Always eligible for autonomous start. Output goes to ~/.claude/autodo/{id}/report.md.

Code Change

Modify on Branch

Creates branch auto/autodo/{date}/{id}, commits with [auto] prefix, tags checkpoints as autodo-checkpoint-N. Only auto-starts for low-risk items.

Safety Guardrails

Three mechanisms prevent runaway work:

User-return detection — If user idle time drops below 2 minutes, active workers are paused. Your keystrokes always take priority.

Timeout enforcement — Each item has a max_minutes budget (10–45 min depending on scope). Worker is marked complete when time expires.

Session liveness — Engine polls whether the forked session is still alive. Dead sessions are marked complete automatically.

Scanner Checks

Check	Finds	Freq
broken_links	Markdown links to non-existent paths	Weekly
port_mismatches	Doc port numbers that don't match Caddyfile	Weekly
conventions	8 rg patterns: bare except, Optional[], logging import, etc.	Weekly
error_swallowing	try/except returning falsy sentinels or bare pass	Monthly
registry_consistency	Service registry vs plist vs tasks.py cross-check	Weekly
logging_context	Error logs missing structured kwargs (url=, key=)	Weekly
dead_code	Unused imports and uncalled functions	Monthly
duplication	Known copy-paste anti-patterns across files	Monthly
logging_thoroughness	Full-codebase logging audit (skips startup/debug)	Weekly
test_health	Test suites that fail, flake, or haven't run recently	Daily

Scale

Scanner Checks

Work Sources

~4,700

Lines of Code

Each scanner check is a function returning structured results. Adding a new source or check is the primary extension point — the queue, execution, and review machinery is shared.