# Evolving Research Jobs

Research jobs are different from content pipelines. A YouTube channel job has a
fixed schema: fetch transcript, extract metadata, score. The schema stabilizes
quickly and rarely changes. A research job — company assessment, competitive
analysis, supply chain mapping — is fundamentally iterative. Your understanding
of what to measure, how to measure it, and what matters evolves with every batch.

This document captures the pattern for building jobs that evolve gracefully.

## The Core Loop

```
Assess (broad, shallow)
  → Review results, notice gaps
  → Refine prompts / add stages / adjust scoring
  → Re-assess (stale detection picks up changes automatically)
  → Review again
  → Repeat until understanding stabilizes
```

This is not a bug — it's the intended workflow. The jobs framework supports it
through VERSION_DEPS, stale detection, and cascade reprocessing.

## What Evolves

| Dimension         | Example                                                    | Mechanism              |
|-------------------|------------------------------------------------------------|------------------------|
| **Prompts**       | "Rate innovation" → "Rate innovation pace AND trajectory"  | VERSION_DEPS hash change → stale |
| **Stages**        | Add `patent_analysis` after realizing patents matter        | New stage, existing items start as pending |
| **Scoring**       | Simple rating → weighted composite with normalization       | VERSION_DEPS on scoring function |
| **Universe**      | Start with 50 companies, expand to 200 based on findings   | Discovery adds items, stages run on new items |
| **Models**        | Start with flash for speed, upgrade to pro for depth        | Model string in VERSION_DEPS |
| **Tier depth**    | Broad pass on all → deep pass on top quartile              | Priority-based processing, stage concurrency |

## Design Principles

### 1. Prompts Are Code

Prompts are the most-changed part of a research job. Treat them accordingly:

- **Store prompts as module-level constants**, not inline strings
- **Include prompts in VERSION_DEPS** — prompt change = stale results
- **Version prompts explicitly** when the change is semantic (not just formatting)

```python
_INNOVATION_PROMPT = """Analyze the innovation pace of "{name}"...
Focus on PACE and TRAJECTORY — are they getting faster or slower?
..."""

VERSION_DEPS = {
    "innovation_pace": [_INNOVATION_PROMPT, str(MODEL)],
}
```

When you edit `_INNOVATION_PROMPT`, every company previously assessed with the old
prompt shows as stale in the dashboard. One click to reprocess.

### 2. Stages Are Additive

Never remove a stage from a running job — add new ones alongside. Old results
remain queryable. If a stage becomes irrelevant, stop running it (set concurrency
to 0 or remove from the active stage list) but keep the results table intact.

When you add a new stage:
- Existing items get `pending` status for the new stage automatically
- The runner processes them through the new stage on the next pass
- No manual migration needed

### 3. Re-running Is First-Class

Every stage must be **idempotent and re-runnable**:

- Stage reads from DB/prior results, not from external fetches (fetch once, analyze many times)
- Stage writes via UPSERT, not INSERT — safe to run twice
- Stage result includes enough context to understand without re-running

The `reprocess_stale` function in the tracker handles cascade:
reprocess stage N → also marks stages N+1, N+2... as pending.

### 4. Scoring Is Derived, Not Stored Inline

Assessment stages produce raw findings (JSON with evidence, ratings, quotes).
Scoring is a **separate computation** that reads raw findings and produces numbers.

Why separate:
- You can re-score without re-assessing (cheaper, faster)
- Scoring formula evolves independently of assessment prompts
- You can A/B test scoring formulas on the same raw data

```
innovation_pace stage → raw findings (pace, trajectory, evidence, patents)
     ↓
scoring function → innovation_score (0-10), stored in company_scores table
```

The scoring function lives in the handler but is tracked separately in VERSION_DEPS.

### 5. Tiered Depth

Not every company deserves the same analysis depth. Use priority to control this:

| Tier     | Criteria                                    | Analysis          | Cost    |
|----------|---------------------------------------------|-------------------|---------|
| Broad    | All companies in universe                   | Flash model, 5 stages | ~$0.05/co |
| Deep     | quality_score > 70 AND interesting roles    | Pro model, 5 stages + follow-ups | ~$0.50/co |
| Spotlight | Manual selection or top-10 by composite    | Pro model + SEC filings + patent deep-dive | ~$2/co |

Implement via priority field: broad items get priority=100 (processed last),
deep items get priority=10, spotlight gets priority=1.

Or: separate jobs in a pipeline (`semi_assessment` → `semi_deep_dive`), where
the deep dive job's discovery strategy is `tracker_query` reading from the
assessment job's done items filtered by score.

### 6. Cross-Pollination

Research on one company informs assessment of another:

- Innovation pace of competitors provides context ("fast vs peers" needs peer data)
- Supply chain position (from supplychain graph) affects bottleneck scoring
- Industry trends discovered in one assessment apply to the whole sector

The unified DB enables this: the `company_scores` table and `relationships` table
are in the same DB, so a scoring function can JOIN them.

## Practical Workflow

### First Pass: Broad Assessment

```bash
# Seed from interesting universe
python tools/supplychain/export_universe.py  # generates CSV
# Add top 50 to the job
cat tools/supplychain/data/interesting_universe.csv | \
  python -c "import csv,sys; [print(r['name']) for r in csv.DictReader(sys.stdin)]" | \
  head -50 | while read name; do
    jobctl add semi_assessment "$name"
  done

# Run first batch
jobctl run semi_assessment -n 10
```

### Review & Refine

```bash
# Check results in dashboard
inv jobs.server  # → jobs.localhost → semi_assessment

# Or query directly
python -c "
import sqlite3
db = sqlite3.connect('intel/companies/data/companies.db')
for r in db.execute('''
    SELECT c.name, cs.innovation_pace, cs.leadership_rating, cs.interestingness_score
    FROM company_scores cs JOIN companies c ON c.id = cs.company_id
    ORDER BY cs.interestingness_score DESC LIMIT 20
'''):
    print(r)
"
```

Look for:
- Are the ratings discriminating? (If everyone is "strong", the prompt is too generous)
- Is evidence specific? (Vague evidence = prompt needs more grounding pressure)
- Are there surprises? (A "stagnant" company you expected to be innovative = dig deeper)

### Refine Prompts

Edit the prompt constant in the handler. The VERSION_DEPS hash changes automatically.

```bash
# See what's stale
jobctl status semi_assessment

# Dashboard shows "N items stale" per stage
# Click "Reprocess Stale" or:
jobctl reset semi_assessment innovation_pace
jobctl run semi_assessment
```

### Add a Stage

Edit `jobs.yaml` to add a new stage (e.g., `patent_analysis`). Add the handler
function. Existing items automatically get `pending` for the new stage.

### Expand Universe

```bash
# After first pass, you realize materials companies are underrepresented
# Re-run export with adjusted filters, seed more companies
jobctl add semi_assessment "JSR Corporation" --data '{"ticker":"4185.T","country":"Japan"}'
```

## Anti-Patterns

| Anti-Pattern | Why It's Bad | Instead |
|---|---|---|
| Hardcoded company list in Python | Can't evolve without code changes | Discovery from DB or CSV |
| Inline prompt strings | Can't track changes via VERSION_DEPS | Module-level constants |
| One mega-stage that does everything | Can't re-run parts, can't add granularity | Small focused stages |
| Storing scores in the result dict only | Not queryable across companies | Write to `company_scores` table |
| Deleting old results when re-running | Lose ability to compare versions | Keep old results, write new version |
| Same depth for all companies | Wastes budget on uninteresting companies | Tiered depth via priority |

## Relationship to Other Jobs

```
supplychain_anchors/expand  →  builds the graph (who exists, who relates to whom)
         ↓ (unified DB)
semi_assessment             →  assesses quality (how innovative, how resilient)
         ↓ (company_scores)
company_analysis            →  investment thesis (bull/bear, sentiment, valuation)
         ↓ (reports)
human review                →  investment decisions
```

Each job reads from the previous job's output via the shared DB. No direct job
dependencies — just shared data. This means each can evolve independently.