# lib/tune TODO

```yaml
# Defaults: status=open, needs=autonomous, effort=M
items:
  # @tune Integration
  - id: tune-test-groq-latency
    title: "Test Groq latency for badge-type racing tasks"
    meta: {tags: [research, infra], effort: S, scope: [lib/llm/race.py]}
  - id: tune-test-cerebras-latency
    title: "Test Cerebras Llama 3.3 70B latency — $0.60/$0.60, ~2000 tok/s claimed"
    meta: {tags: [research, infra], effort: S, scope: [lib/llm/race.py]}
  - id: tune-vario-generate-one
    title: "Decorate vario _generate_one() with @tune for model selection observation"
    meta: {tags: [infra], effort: S, scope: [vario/blocks/produce.py]}
  - id: tune-vario-critique
    title: "Decorate vario critique() with @tune — is haiku sufficient as default critic?"
    meta: {tags: [infra, research], effort: S, scope: [vario/blocks/score.py]}
  - id: tune-jobs-doctor-classify
    title: "Decorate jobs _llm_classify() with @tune — highest volume, could be cheaper"
    meta: {tags: [infra], effort: S, scope: [jobs/lib/doctor.py]}
  - id: tune-jobs-score-llm
    title: "Decorate jobs _yt_stage_score() call_llm_json with @tune — per-item across all jobs"
    meta: {tags: [infra], effort: S, scope: [jobs/lib/stages.py]}
  - id: tune-jobs-company-analysis
    title: "Decorate company_analysis _llm_json() and _llm_markdown() with @tune"
    meta: {tags: [infra], effort: S, scope: [jobs/handlers/company_analysis.py]}
  - id: tune-jobs-supplychain
    title: "Decorate supplychain handler _llm_json() with @tune"
    meta: {tags: [infra], effort: S, scope: [jobs/handlers/supplychain.py]}
  - id: tune-jobs-codebase-maintenance
    title: "Decorate codebase_maintenance _stage_classify() and _stage_review() with @tune"
    meta: {tags: [infra], effort: S, scope: [jobs/handlers/codebase_maintenance.py]}
  - id: tune-jobs-earnings-backfill
    title: "Decorate earnings_backfill _llm_search_replay() with @tune"
    meta: {tags: [infra], effort: S, scope: [jobs/handlers/earnings_backfill.py]}
  - id: tune-jobs-vic-ideas
    title: "Decorate vic_ideas _llm_sanity_check() with @tune"
    meta: {tags: [infra], effort: S, scope: [jobs/handlers/vic_ideas.py]}
  - id: tune-intel-companies
    title: "Decorate intel/companies analyze _llm_json() with @tune — could race models"
    meta: {tags: [infra], effort: S, scope: [intel/companies/analyze.py]}
  - id: tune-helm-recap
    title: "Decorate helm recap with @tune for quality vs speed tradeoff"
    meta: {tags: [infra], effort: S, scope: [helm/recap.py]}
  - id: tune-ingest-fetch-strategy
    title: "Track direct vs browser vs proxy fetch success rates with @tune"
    meta: {tags: [infra], effort: S, scope: [lib/fetch/fetcher.py]}
    notes: >
      Broader pattern: every paid API call (BD unlocker, Serper, Deepgram, Finnhub) should
      get caller attribution like LLM calls do via lib/cost_log.py. @tune is the natural
      home — observe cost, success rate, and caller per paid endpoint. The fetch strategy
      layer (see lib/fetch/TODO.md#fetch-strategy-layer) needs this to answer "which handler
      is burning $371/mo on unlocker?" Currently we know aggregate BD zone cost but can't
      attribute it to callers. See principle: observability/attribute-costs-at-consumption.
  - id: tune-wrap-race-llm
    title: "Add @tune(observers=[CostObserver()]) on race_llm() to track which model wins per label"
    meta: {tags: [infra], effort: S, scope: [lib/llm/race.py]}
  - id: tune-vario-advice-quality
    title: "Build offline judge pass scoring vario model responses for advice quality — unique catches get extra weight"
    meta: {tags: [research, feature], effort: L, scope: [lib/tune/, vario/], depends: [tune-vario-generate-one]}
```

## Vision expansion

Current tune: **decorator for function-level observation + optimization.** You `@tune` a function you own, it tracks timing/errors/metrics, optionally picks among choices via bandit.

The items below expand this to **codebase behavior observability** — understanding how shared utilities are used across many callers, not just optimizing one function. The decorator pattern doesn't fit shared utilities (too cheap to wrap, interesting data is *who calls* not *what runs*). This needs:

1. **Caller-aware observation** — record who calls a utility, not just that it ran
2. **Ad-hoc observation** — context managers / inline observe, not just decorators
3. **Aggregate insights** — "which models produce fenced output?" is a cross-caller question

This is a meaningful scope expansion: tune becomes the observability layer for understanding code behavior patterns system-wide, not just a function-level optimization tool.

## Call stack observation

`@tune` currently records `caller` as `{module, fn}` from the decorated function. But for shared utilities like `strip_fences`, the interesting question is **who called it** — the call stack, not the function itself.

Options:
- `inspect.stack()` is expensive (~1ms). Only sample (1 in N calls).
- Frame walking (`sys._getframe(1)`) is cheap (~1μs). Record `f_code.co_filename:co_name`.
- Key by caller module, aggregate counts: `{caller: {stripped: N, noop: M}}`.

Use case: `strip_fences` — know which models/prompts produce fenced output despite "no fences" instruction. Also which callers never need stripping (prompt is effective there).

## Observer/logger object return

Consider having `@tune` return (or make available) an observer object that callers can use to add context after the call. Example:

```python
result, obs = strip_fences(raw, _observe=True)
obs.note(model=model, prompt_said_no_fences=True)
```

Or a context manager pattern:

```python
with tune.observe("strip_fences") as obs:
    result = strip_fences(raw)
    obs.note(model=model, caller="design")
```

This avoids decorating simple functions while still getting rich observation data.

## Candidates for @tune observation

- `strip_fences` — did it strip? who called? which model produced fenced output?
- `call_llm` — already partially observed via billing, but tune could track latency distributions per model
- `resolve_model` — track alias usage frequency
