# Stale-by-Default Versioning with LLM Triage

Date: 2026-03-05
Status: Implemented

## Problem

Current model is **valid-by-default**: code changes produce a new hash, dashboard shows a stale count, user must manually `jobctl reset`. Risks:

1. **Forget to reset** — stale results silently served as valid
2. **False positives** — log/comment changes trigger staleness, creating noise that trains users to ignore stale counts
3. **VERSION_DEPS must be exhaustive** — missing a dep means invisible staleness

## Design

Invert the default: hash mismatches are **stale-by-default**. An LLM (Opus, high reasoning) triages each change to auto-bless cosmetic ones and queue semantic ones for user decision.

### Flow

```
Code deployed → Runner starts → computes stage hashes
  ↓
Hash mismatch detected for job/stage
  ↓
Load old source from version_snapshots + new source via inspect.getsource()
  ↓
Send diff to Opus (maxthink) → "cosmetic or semantic?"
  ↓
┌─ Cosmetic → auto-bless (write equivalence, log reason)
│              Items stay "done". No user action.
│
└─ Semantic → queue for user decision
               Pushover notification: "vic_ideas/extract changed (semantic)"
               Items treated as stale (not auto-reprocessed)
               ↓
               User runs `jobctl stale` → sees pending decisions
               ↓
               `jobctl bless vic_ideas extract` → equivalence recorded, items stay done
               `jobctl reset vic_ideas extract` → items reset to pending for reprocessing
```

### Why not auto-reprocess semantic changes?

Code changes frequently during development. Auto-reprocessing 1000 items on every semantic change causes thrashing and wasted API spend. The user should decide when results are worth re-running.

### Components

#### 1. `version_snapshots` table

Stores the source code that produced each hash, enabling real diffs when the hash changes.

```sql
CREATE TABLE version_snapshots (
    id INTEGER PRIMARY KEY,
    job_id TEXT NOT NULL,
    stage TEXT NOT NULL,
    version_hash TEXT NOT NULL,
    source_text TEXT NOT NULL,      -- inspect.getsource() output
    deps_text TEXT,                 -- VERSION_DEPS sources (if any)
    created_at TEXT DEFAULT (datetime('now')),
    UNIQUE(job_id, stage, version_hash)
);
```

**Write path**: Runner stores snapshot on first use of a new hash (INSERT OR IGNORE).
**Read path**: On hash mismatch, load old snapshot for diff.
**Size**: ~1-5 KB per snapshot. Hundreds of snapshots = trivial.

#### 2. `version_equivalences` table

Records that two hashes produce equivalent output (blessed by LLM or user).

```sql
CREATE TABLE version_equivalences (
    id INTEGER PRIMARY KEY,
    job_id TEXT NOT NULL,
    stage TEXT NOT NULL,
    old_hash TEXT NOT NULL,
    new_hash TEXT NOT NULL,
    blessed_by TEXT NOT NULL,       -- 'llm_auto' | 'user_manual'
    reason TEXT,                    -- LLM's reasoning or user's --reason
    created_at TEXT DEFAULT (datetime('now')),
    UNIQUE(job_id, stage, old_hash, new_hash)
);
```

#### 3. Staleness check (modified)

Current check: `handler_version IS NULL OR handler_version != current_version`

New check: same, but **exclude blessed equivalences**. An item is stale if:
- `handler_version != current_version` AND
- No equivalence chain connects `handler_version` → `current_version`

For simplicity, start with **direct equivalence only** (no transitive chains). If A was blessed to B, and B changes to C, C is evaluated fresh. Transitive lookup adds complexity for minimal gain — version changes are rare and sequential.

```sql
-- Item is stale if version mismatches AND no equivalence exists
SELECT COUNT(*) FROM results r
WHERE r.job_id = ? AND r.stage = ?
  AND r.handler_version IS NOT NULL
  AND r.handler_version != ?
  AND NOT EXISTS (
    SELECT 1 FROM version_equivalences ve
    WHERE ve.job_id = r.job_id AND ve.stage = r.stage
      AND ve.old_hash = r.handler_version AND ve.new_hash = ?
  )
```

#### 4. LLM triage

Called once per (job, stage) hash change, not per item. Prompt:

```
You are reviewing a code change to a data processing stage.
The stage "{stage}" in job "{job_id}" has changed.

OLD SOURCE:
{old_source}

NEW SOURCE:
{new_source}

Would this change affect the output data produced by this function?

- "cosmetic" = only logging, comments, formatting, variable renames, type hints,
  error messages, or other changes that don't affect the returned data
- "semantic" = prompt changes, logic changes, model changes, new/removed fields,
  different API calls, changed thresholds — anything that could produce different output

Answer with a JSON object:
{"verdict": "cosmetic" | "semantic", "reason": "one sentence explanation"}
```

Model: `claude-opus-4-6` with `reasoning_effort="high"`.
Cost: ~$0.05-0.15 per call. Frequency: a few times per deploy at most.

#### 5. CLI commands

**`jobctl stale`** — Show all stages with unresolved semantic changes:

```
Job                Stage        Stale Items  Changed
vic_ideas          extract      340          2 min ago
earnings_backfill  price        89           1 hour ago

Auto-blessed (last 24h):
vic_ideas          score        0            cosmetic: "only added debug logging"
```

**`jobctl bless JOB STAGE`** — Mark current hash equivalent to stored:

```bash
jobctl bless vic_ideas extract                    # bless with no reason
jobctl bless vic_ideas extract --reason "only added type hints"
jobctl bless vic_ideas --all                      # bless all stages
```

**`jobctl reset JOB STAGE`** — Already exists. Resets stale items to pending.

#### 6. Runner integration

At startup and on config reload, for each (job, stage):

1. Compute current hash
2. Store snapshot (INSERT OR IGNORE)
3. Compare with hashes stored in results table
4. If mismatch and no equivalence exists:
   a. Load old snapshot
   b. Call LLM triage
   c. Cosmetic → insert equivalence (blessed_by='llm_auto')
   d. Semantic → log warning, send Pushover
5. Continue running — stale items are NOT auto-reprocessed

Runner does NOT block on LLM triage. It can run the triage async and continue processing. Items with stale versions are skipped by the stage worker (they're "done" but won't be picked up again until explicitly reset).

#### 7. Notification

On semantic change detection:
```
Pushover: "jobs: vic_ideas/extract changed (semantic) — 340 items stale"
```

On auto-bless:
```
Log only: "auto-blessed vic_ideas/score: cosmetic (only added debug logging)"
```

### What this does NOT change

- **Result storage**: Results still stored with handler_version hash
- **VERSION_DEPS**: Still works the same — contributes to hash computation
- **HANDLER_VERSION overrides**: Still works — explicit versions bypass source hashing
- **Cascade reprocessing**: `jobctl reset` still cascades to downstream stages
- **Dashboard "Reprocess Stale" button**: Still works, now excludes blessed equivalences

### Migration

1. Create `version_snapshots` and `version_equivalences` tables
2. Modify `get_stale_count()` to check equivalences
3. Add snapshot storage to runner's version computation
4. Add LLM triage call on hash mismatch
5. Add `jobctl stale` and `jobctl bless` commands
6. Add Pushover notification for semantic changes

Existing results with `handler_version IS NULL` remain treated as stale (same as today).

### Handler disk-cache clarification

Separate from versioning, handlers fall into two categories:

| Pattern | Example | Keep? |
|---------|---------|-------|
| **Raw content cache** (fetch saves HTML/JSON/audio to disk) | vic_ideas/fetch, vic_wayback/fetch | Yes — "Raw Data First" convention |
| **Result cache** (stage checks if output file exists, returns early) | company_analysis (removed) | No — shadows framework |

Remaining handlers with raw content caches (vic_ideas/fetch, vic_wayback/fetch, earnings_backfill/ib) are correct. The `force=True` mechanism for re-fetching is a separate, narrow feature for when you explicitly want to re-download from external sources.
