An autonomous work engine that finds issues, builds features, and improves the codebase — all while you're away. Not CI. Not a linter. A teammate that works your idle hours.
Autodo is proactive. It doesn't wait for a push or a PR — it looks at the whole codebase, decides what's worth doing, queues it by priority, and executes in forked Claude sessions with full LLM reasoning.
Work items span the full spectrum: fixing broken links, yes — but also writing new test suites, implementing small features, researching design options, and producing structured reports.
| CI / CD | Autodo | |
|---|---|---|
| Trigger | Code push / PR | User idle time |
| Question | "Did this change break anything?" | "What should improve next?" |
| Scope | The diff | The entire codebase |
| Output | Pass / fail gate | Reports, commits, filed work items |
| Agency | Reactive — runs a fixed script | Proactive — chooses what to work on |
| Reasoning | Deterministic checks | LLM judgment + multi-model review |
| Catches | Regressions in changed code | Rot, drift, missing tests, opportunity |
| Posture | Gatekeeper | Contributor |
CI answers "is this PR safe to merge?" Autodo answers "what should I build, fix, or investigate while you're at lunch?" They're complementary — CI gates changes, autodo generates them.
Not just bug fixes. The queue holds three distinct categories:
Broken links, convention violations, error handling gaps, stale docs, dead code. The stuff that accumulates when you're focused on features.
Add a missing test suite. Implement a small feature from a TODO. Write a CLI command. Create a health endpoint. Things that move the project forward.
Evaluate a library. Audit a subsystem. Compare approaches. Produce a structured report with findings, tables, and recommendations.
The queue is fed by multiple sources, expanding over time:
9 automated checks (ripgrep, no LLM) run periodically. Broken links, conventions, error swallowing, dead code, logging gaps. Cheap and idempotent.
Planner reads every TODO.md, deduplicates, classifies via fast LLM. Each item gets tier, risk, effort, priority. Existing items auto-resolve when checked off.
User or another session files a work item directly via autodo run or the jobs DB. Bypasses the scanner — for ad-hoc tasks or one-off investigations.
Run test suites on a schedule to catch rot: dependency breakage, time-sensitive tests, environment drift, flaky tests. Things CI misses because CI only runs on push.
Planned. After reviewing a PR or branch, file follow-up items: edge cases to test, docs to update, patterns to extract. The review generates work, not just approval.
Planned. Product ideation sessions produce concrete tasks: prototypes to build, alternatives to evaluate, user flows to test. Ideas go straight to the queue with priority and scope.
Scanner — Ripgrep-based static checks run in-process. No LLM, no fork. Returns structured ScanIssue objects. Adding a new check is ~30 lines of Python plus a YAML entry.
Planner — Mines TODO.md files for unchecked items. Deduplicates by text fingerprint, classifies in batches via a fast LLM. Each item gets: tier, risk, effort, priority 1–5.
Test runner — Executes test suites periodically. Failures are filed as work items with the traceback and affected module. Catches environmental rot that no code change triggered.
Not every finding should become a work item. scanner_allow.yaml lets you suppress findings with a documented reason — the autodo equivalent of # noqa. File-level or line-specific.
All items land in the jobs database as work_items under job_id=codebase_maintenance. Deduplication by deterministic key. Items that vanish from subsequent scans are auto-resolved.
The helm loop calls check_autonomous() every tick. If the user is idle, it selects the highest-priority eligible item and forks a session via it2 fork claude (falls back to tmux).
Each session gets a tailored prompt: safety rules, scope limits (file count, line count, time budget), project context, and type-specific guidance. Code-change workers use multi-model self-review.
Workers log every decision with a severity tag: [low], [medium], [high], [blocked]. These flow into review files at ~/.claude/autodo/{id}/review.md. Low-risk-only reviews can be auto-approved; anything else requires human judgment.
Two trust levels determine what the forked session can do:
No file modifications. Produces structured reports: audits, research, evaluations. Always eligible for autonomous start. Output goes to ~/.claude/autodo/{id}/report.md.
Creates branch auto/autodo/{date}/{id}, commits with [auto] prefix, tags checkpoints as autodo-checkpoint-N. Only auto-starts for low-risk items.
Three mechanisms prevent runaway work:
User-return detection — If user idle time drops below 2 minutes, active workers are paused. Your keystrokes always take priority.
Timeout enforcement — Each item has a max_minutes budget (10–45 min depending on scope). Worker is marked complete when time expires.
Session liveness — Engine polls whether the forked session is still alive. Dead sessions are marked complete automatically.
| Check | Finds | Freq |
|---|---|---|
| broken_links | Markdown links to non-existent paths | Weekly |
| port_mismatches | Doc port numbers that don't match Caddyfile | Weekly |
| conventions | 8 rg patterns: bare except, Optional[], logging import, etc. | Weekly |
| error_swallowing | try/except returning falsy sentinels or bare pass | Monthly |
| registry_consistency | Service registry vs plist vs tasks.py cross-check | Weekly |
| logging_context | Error logs missing structured kwargs (url=, key=) | Weekly |
| dead_code | Unused imports and uncalled functions | Monthly |
| duplication | Known copy-paste anti-patterns across files | Monthly |
| logging_thoroughness | Full-codebase logging audit (skips startup/debug) | Weekly |
| test_health | Test suites that fail, flake, or haven't run recently | Daily |
Each scanner check is a function returning structured results. Adding a new source or check is the primary extension point — the queue, execution, and review machinery is shared.