# Jobs Framework Review Findings

**Date**: 2026-02-26
**Status**: Complete — deep review + vario maxthink critique done

## Source

Two deep-review agents analyzed the full jobs framework. Vario maxthink critique in progress.

## Architecture Overview

```
jobs/
├── runner.py      (1088 lines) — async main loop, stage workers, discovery, guards
├── lib/
│   ├── tracker.py (1006 lines) — SQLite schema + CRUD, 6 tables
│   ├── discovery.py (1750 lines) — 17+ pluggable discovery strategies
│   ├── doctor.py   (462 lines) — LLM error classification + circuit breaker
│   ├── job.py      (256 lines) — 8 dataclasses for config, YAML loading
│   ├── executor.py  (31 lines) — importlib handler resolution
│   └── pacer.py     (35 lines) — simple rate limiter
├── ctl.py         (608 lines) — CLI control (wake/pause/status/add-url/reprocess)
├── jobs.yaml      (1260 lines) — all job definitions
└── handlers/      — per-job handler modules
```

## Runner.py Key Findings

### Essential (keep)
- Concurrent discovery + stage workers (truly parallel, not round-robin)
- Doctor error intelligence (per-error LLM classification)
- Singleton lock (prevents multiple runners)
- Wake file mechanism (2s responsiveness)
- Config hot-reload via mtime + SIGHUP
- Orphan reclaim (handles crashes)
- Resource registry (cross-job semaphores)

### Over-complicated
- RetryLaterError circuit breaker + Doctor both exist (redundant)
- BackoffTimer hardcoded 1-5s (not configurable per stage)
- Nested process_one() closure (100+ lines deep)
- Backfill 1h slowdown doesn't reset on reload
- Job startup cleanup (90 lines) duplicated in orphan reclaim
- Double-connection pattern (open_raw_db vs open_db inconsistent)

### Fragile
- once_watchdog 9s threshold (handlers taking >9s cause premature exit)
- Discovery interval slowdown persists through config reload
- RetryLaterError threshold hardcoded at 5 (not per-job)
- active_items set keyed by item_id not (job_id, item_id)

### Radical cleanup estimate: ~30% LOC reduction by extracting functions + removing duplication

## Lib Modules Key Findings

### doctor.py (462 lines) — 90% essential
- LLM error classification is elegant and captures domain knowledge
- Circuit breaker buffers pause actions with per-class thresholds
- Minor: markdown fence stripping unnecessary, thresholds should be configurable

### discovery.py (1750 lines) — biggest cleanup target
- 17 strategies with significant copy-paste
- 3 YouTube strategies share yt-dlp subprocess patterns
- URL hashing (md5[:10]) repeated 5 times
- Could extract: yt-dlp wrapper, URL hash util, async subprocess helper
- Could merge: company_sources + company_watchlist + static_ein_list

### job.py (256 lines) — right-sized
- Clean dataclass hierarchy
- Dead field: PacingConfig.circuit_breaker (moved to doctor.py)
- YAML parsing handles backward compat well

### executor.py (31 lines) — minimal, keep as-is
### pacer.py (35 lines) — possibly unused, too simple for batch workloads

### ctl.py (608 lines) — mixes concerns
- Business logic (add_urls, reprocess_fast) should be in lib modules
- Could drop to ~200 lines of pure CLI dispatch

## Tables Assessment

| Table | Lines | Verdict |
|-------|-------|---------|
| work_items | core | **Keep** — drop result_path (dead) |
| results | core | **Keep as-is** — cleanest table |
| job_state | 8 cols | **Simplify** — 5 cols sufficient |
| job_runs | 5 cols | **Drop** — absorbed by events |
| job_cost_log | 5 cols | **Drop** — absorbed by events |
| job_events | 7 cols | **Rename to events**, absorb cost_log + runs |

## Core vs Plugin (for generic extraction)

### Core (reusable by any project)
- SQLite tracker (work_items + results + events + job_state)
- Stage state machine (stages JSON + status sync)
- Version hashing + stale detection
- Error classification framework (doctor pattern)
- Discovery strategy registry
- Stage worker loop (poll → process → backoff)
- Wake file mechanism
- Config hot-reload
- Pacer/rate limiting

### Plugin (rivus-specific)
- All 17 discovery strategies (finnhub, youtube, etc.)
- All handlers (earnings, transcripts, supply chain)
- Dashboard (Gradio UI)
- ctl.py business logic (add-url routing, newsflow)
- jobs.yaml job definitions
- Autodo scanner + CLI

## Open Questions for Vario Review
1. Should stages JSON stay in work_items or normalize to a stages table?
2. Is the 6→4 table simplification right, or should we go further?
3. What's the right boundary between core and plugin?
4. Is the upsert_items merge logic justified or over-engineered?