# Draft Review Recipe — Design

**Date**: 2026-02-27
**Status**: Draft

## Context

Vario is a powerful dataflow platform for model-based analysis, with meta-reflection and iterative self-improvement capabilities. Building blocks (generate, critique, score, refine, vote, synthesize, ...) are the atomic units; recipes organize them into multi-step workflows that can loop, branch, reflect on their own state, and redirect. Each block is specialized by its prompt (what to analyze) and model configuration (which models, how many, what temperature) — the same block type serves different purposes depending on how it's instantiated. Blocks and recipes are used, adapted, and evolved over time. A workflow can fan out a block to N models in parallel, reduce their outputs through debate or voting, iterate until convergence, and adapt strategy mid-run.

## Problem

Vario has individual review prompts (linguistic_cleanup, logic_critic, readability_critic) and multi-model generation, but no orchestrated workflow that:
- Runs multiple review **lenses** across multiple **models** in one pass
- Synthesizes findings into a unified verdict with strength-of-need score
- Produces a tracking HTML report
- Supports iterative refinement (apply suggestions → re-review)

The `critique` config in `extract_prompts.yaml` is the closest existing feature — it runs 4 prompts against one model sequentially. But it's a simple prompt list, not a recipe: no multi-model, no synthesis, no staging. The review recipe replaces it with N lenses × M models in parallel, two-stage synthesis, and a report. Once review works, the `critique` config becomes redundant and should be removed.

(Other configs in `extract_prompts.yaml` — `facts`, `research`, `event`, `cleanup` — are simple single-model extraction tasks. They don't need recipe machinery and stay where they are.)

---

## I. Content — What to Review

### Review Lenses

#### MVP Lenses (4)

| Lens | Source | Prompt |
|------|--------|--------|
| **Language** | existing `linguistic_cleanup` | Grammar, voice, phrasing, parallelism |
| **Logic & Claims** | extended `logic_critic` | Fallacies, consistency, blind spots + claims calibration (see below) |
| **Readability** | existing `readability_critic` | Flow, engagement, signposting, clarity |
| **Structure** | NEW | Section ordering, hierarchy, balance, completeness |

#### V2 Lenses

| Lens | Notes |
|------|-------|
| **Completeness** | Missing sections, unanswered questions, gaps |
| **Audience** | Tone fit, assumed knowledge, jargon level |
| **Diagrams** | Clarity, accuracy, labeling (skip if no diagrams) |

### Logic & Claims Lens (extended from `logic_critic`)

The MVP logic lens extends the existing `logic_critic` prompt with claims calibration:

- **Existing coverage** (from `logic_critic`): dubious claims, internal inconsistencies, irrelevant material, missing arguments, blind spots, formal fallacies
- **Added — Overclaiming**: assertions stronger than evidence warrants ("proves" when it "suggests", generalizing from limited examples)
- **Added — Underclaiming**: burying significant findings, hedging too much, missing obvious conclusions
- **Added — Evidence-claim alignment**: claims without cited evidence, evidence cited but not connected to a claim
- **Added — Adversarial critique**: strongest objection a skeptical expert would raise

One combined lens, not two. A good logic review naturally asks both "is the reasoning valid?" and "are the claims proportionate?"

### V2: Structure Sub-Lenses

Decompose "structure" into specialized checks: TOC/outline, overview/intro, examples, abstract/claims, conclusion, references. MVP: single structure prompt. V2: break out if depth needed.

### V2: Attention & Emotion Management Lens

For reports/presentations meant to persuade: reading experience map, payoff points, cognitive load, scanability. See [considerations doc](https://static.localhost/docs/plans/2026-02-27-draft-review-considerations.md) §6.

### Lens Definition Format

Each lens is a dict with `name`, `prompt`, and optional `skip_if` predicate:

```python
@dataclass
class ReviewLens:
    name: str
    prompt: str
    skip_if: Callable[[str], bool] | None = None  # e.g., skip diagrams if no diagrams
```

MVP lenses defined in Python (not YAML) since they reference existing prompts from extract_prompts.yaml and add the structured output format wrapper. V2: move to `vario/review_lenses.yaml` if the set grows.

### Structured Output Format

Each lens prompt gets this suffix appended by the orchestrator:

```
## Output Format

Return your findings as a JSON array. Each finding:
{
  "severity": "critical" | "moderate" | "minor",
  "location": "section name or quote",
  "finding": "what's wrong",
  "suggestion": "specific fix",
  "quote": "exact text from artifact (if applicable)"
}

After the JSON array, add:
LENS_SCORE: <0-100>
LENS_SUMMARY: <one sentence>
```

This structured format enables:
- Programmatic aggregation across models within a lens
- Severity counting for strength-of-need calculation
- Location tracking for the apply phase

---

## II. Orchestration — How to Run

### Core Abstraction: Lens × Model Matrix

```
             opus    gemini    grok    gpt
language      ✓        ✓        ✓       ✓     → per-lens synthesis
logic         ✓        ✓        ✓       ✓     → per-lens synthesis
readability   ✓        ✓        ✓       ✓     → per-lens synthesis
structure     ✓        ✓        ✓       ✓     → per-lens synthesis
                                                      │
                                                      ▼
                                              cross-lens verdict
```

Each cell = one `call_llm` invocation. Within a stage, all cells run in parallel via `asyncio.gather`.
Rows reduce first (per-lens consensus across models), then a final pass synthesizes across lenses.

### Staged Execution (not all-parallel)

Lenses run in **two stages**, not all at once. Later lenses benefit from earlier findings:

| Stage | Lenses (parallel within stage) | Rationale |
|-------|-------------------------------|-----------|
| **1. Macro** | Structure, Logic+Claims | May reorganize or remove sections — no point polishing what gets cut |
| **2. Micro** | Language, Readability, (Diagrams in V2) | Polish what survived stage 1; receives stage 1 findings as context |

Stage 2 prompts include stage 1 synthesis: "The following structural/logic issues have been identified. Focus your review on content that is NOT being removed or reorganized: ..."

This avoids wasted work (flagging grammar in a section that's being cut) and lets micro lenses focus on the final shape of the document.

### Phases

**Phase 1: REVIEW** (MVP)
1. Load artifact file(s)
2. **Stage 1 (macro)**: Run structure + logic lenses × models in parallel
3. Per-lens reduce for stage 1 lenses
4. **Stage 2 (micro)**: Run language + readability lenses × models in parallel, with stage 1 findings as context
5. Per-lens reduce for stage 2 lenses
6. Cross-lens synthesis: feed all per-lens findings into a final LLM call that produces:
   - Prioritized suggestions (numbered, with severity)
   - Strength-of-need score (1-5)
   - One-paragraph executive summary
7. Generate `report.html`

**Phase 2: APPLY** (V2)
1. Feed artifact + cross-lens synthesis to an LLM: "Apply these suggestions"
2. Save as `artifact_v2.md`
3. Append "Applied" section to report

**Phase 3: RE-REVIEW** (V2)
1. Run Phase 1 again on updated artifact
2. Include previous critiques as context (so models can assess improvement)
3. Append new sections to existing report
4. Updated strength-of-need score shows trajectory

### Two-Level Synthesis

#### Level 1: Per-Lens (models → one finding set)

For each lens, take N model outputs and produce consensus findings:
- Use `reduce.stream_reduce` with `debate` strategy (critique → synthesize)
- The reduce model sees all N model outputs tagged by source
- Output: deduplicated findings, severity adjusted by agreement level
- If 3/4 models flag the same issue → severity stays or bumps up
- If only 1/4 flags it → severity may drop to "minor" or gets a "disputed" tag

#### Level 2: Cross-Lens (lens findings → verdict)

Feed all per-lens synthesis results to a final LLM call:

```
You are reviewing a draft document. Below are findings from 4 specialized review lenses.

<lens name="language">
[per-lens findings]
</lens>
<lens name="logic">
[per-lens findings]
</lens>
...

Produce:
1. EXECUTIVE SUMMARY: One paragraph on the artifact's overall quality
2. PRIORITIZED SUGGESTIONS: Top 10 most impactful changes, numbered, with:
   - Which lens(es) identified it
   - Severity (critical/moderate/minor)
   - Specific actionable suggestion
3. STRENGTH OF NEED: Score 1-5
   1 = Polish only, minor tweaks
   2 = Solid but could improve in places
   3 = Needs meaningful revision in some areas
   4 = Significant issues, should not publish without revision
   5 = Fundamental rework needed
4. PRAISE: What the artifact does well (important for morale and calibration)
```

### Work Directory

Each review session creates a timestamped directory:

```
vario/reviews/2026-02-27T14-30-00/
  artifact_v1.md              # copy of original input
  artifact_v2.md              # after apply (Phase 2)
  critiques/
    language/
      opus.json               # raw model output (parsed JSON findings)
      gemini.json
      grok.json
      gpt.json
      synthesis.md            # per-lens reduce output
    logic/
      ...
    readability/
      ...
    structure/
      ...
  cross_lens_synthesis.md     # final verdict
  report.html                 # self-contained HTML report
```

This serves two purposes:
1. **Debugging**: See exactly what each model said for each lens
2. **Iteration context**: Phase 3 re-review gets the whole directory as input

### CLI Interface & Steerability

The CLI is the primary interface. MVP is batch (run → report), but with steering knobs from day one:

```bash
# Basic review (batch — run all lenses, all models, produce report)
vario review artifact.md                    # defaults: maxthink preset, all lenses
vario review artifact.md -c fast --open     # cheap models, open report after

# Steer: pick lenses
vario review artifact.md --lens logic       # single lens deep-dive
vario review artifact.md --lens structure,logic  # macro only, skip micro

# Steer: pick section (V1.5 — focus on one part of the document)
vario review artifact.md --section "Architecture"  # only reviews this section
vario review artifact.md --section 3               # by section number

# Multi-file artifact
vario review report.md diagram.d2 appendix.md

# Apply suggestions (V2)
vario review artifact.md --apply                   # apply all
vario review artifact.md --apply 3,5,7             # apply specific suggestions by number

# Re-review after edits (V2)
vario review artifact.md --continue reviews/2026-02-27T14-30-00/
```

**Not steerable mid-run in MVP** — the pipeline runs to completion. Interactive steering (pause, redirect, focus deeper) requires the Gradio tier (V2).

---

## III. Presentation — How to Show Results

Three tiers of presentation, each building on the previous:

| Tier | What | When |
|------|------|------|
| **A. Static report** | Self-contained HTML, opened in browser | MVP |
| **B. Interactive report** | Same HTML + JS for navigation, block focus, clipboard commands | V1.5 |
| **C. Gradio review tab** | Live session: focus, optimize, apply, update | V2 |

All three share the same underlying data (work directory JSON files). They differ only in interactivity.

### Tier A: Static Report (MVP)

Self-contained HTML (same pattern as `strategies/report.py`). Dark theme, inline CSS, collapsible sections.

#### Report Sections

1. **Header** — artifact name, timestamp, model preset, lens count
2. **Executive Summary** — from cross-lens synthesis
3. **Strength-of-Need** — big colored badge (green 1-2, yellow 3, red 4-5)
4. **Document Map** — visual outline of artifact sections with per-section finding counts and severity bars (see below)
5. **Prioritized Suggestions** — numbered list from cross-lens synthesis, each with lens source, severity, actionable fix
6. **Per-Lens Findings** — collapsible `<details>`, shows synthesis + individual model outputs
7. **Trace Timeline** — what ran, when, cost per step (see below)
8. **Praise** — what's working well
9. **Cost Summary** — per-lens breakdown, synthesis cost, total, cost per finding

#### Document Map

The centerpiece of the report. Shows the artifact's structure as an outline with findings overlaid:

```
┌─────────────────────────────────────────────────────┐
│ Document Map                                        │
├─────────────────────────────────────────────────────┤
│ § 1  Introduction          ░░░░░░░░░░  0 findings   │
│ § 2  Architecture          ████░░░░░░  3 findings   │ ← yellow bar
│   2.1  Data Model          ██████████  5 findings   │ ← red bar
│   2.2  API Layer           ░░░░░░░░░░  0 findings   │
│ § 3  Implementation        ████░░░░░░  2 findings   │
│ § 4  Evaluation            ██░░░░░░░░  1 finding    │
│ § 5  Conclusion            ░░░░░░░░░░  0 findings   │
└─────────────────────────────────────────────────────┘
```

- Each section = one row, indented by heading level
- Bar color: green (0 findings), yellow (1-2 minor), orange (moderate), red (critical or many)
- Bar length = total finding count for that section across all lenses
- In Tier B: clicking a section scrolls to its per-lens findings below

This gives the user an instant "heat map" of where the draft needs work. The eye goes to the red bars.

#### Trace Timeline

Shows the pipeline execution as a visual timeline:

```
Trace
├─ Stage 1 (macro)           12.3s  $0.42
│  ├─ structure × 4 models    5.1s  $0.18
│  ├─ logic × 4 models        5.8s  $0.20
│  └─ per-lens synthesis ×2   1.4s  $0.04
├─ Stage 2 (micro)            8.7s  $0.31
│  ├─ language × 4 models     4.2s  $0.15
│  ├─ readability × 4 models  3.9s  $0.14
│  └─ per-lens synthesis ×2   0.6s  $0.02
└─ Cross-lens verdict         2.1s  $0.08
                      Total: 23.1s  $0.81
```

Each node shows wall-clock time and cost. In Tier B: nodes are expandable to show per-model details.

#### Cost Tracking

Each LLM call's cost is tracked (already in Vario's `run_prompt` return). Report shows:
- Per-lens cost breakdown
- Synthesis cost
- Total review cost
- Cost per finding (total / finding count)

### CSS Strategy

Shared static CSS, never LLM-generated:
- Extract shared base to `vario/static/report_base.css`
- Both `strategies/report.py` and `review_report.py` read at import time
- Review-specific CSS additions as small string constant
- Self-contained HTML (CSS inlined into `<style>`)

### Tier B: Interactive Report (V1.5)

Same HTML file as Tier A, but with embedded `<script>` for navigation and action:

1. **Clickable document map** — click a section → scrolls to its findings in the per-lens detail below. Section highlights in the map while viewing.
2. **Suggestion actions** — each prioritized suggestion has a "Copy apply command" button:
   ```
   [📋 Copy] vario review artifact.md --apply 3
   ```
   Clicking copies to clipboard. User pastes into terminal to apply just that one suggestion.
3. **Expandable trace** — trace timeline nodes expand on click to show per-model details (latency, token count, cost).
4. **Filter by severity** — toggle buttons at top: `[Critical] [Moderate] [Minor]` — dims/hides findings below threshold.
5. **Revision history** — when re-reviewing, shows side-by-side: previous findings that were addressed, new findings, score trajectory chart.

No server needed — all JS is inline, works as a static file.

### Tier C: Gradio Review Tab (V2)

New tab in `vario.localhost` — a live interactive review session:

#### Layout

```
┌──────────────────────────────────────────────────────────┐
│  Tabs: Extract / Studio / Review / Prompts               │
├─────────────────────┬───────────────────────────────────┤
│  Document Map       │  Focus Panel                       │
│                     │                                    │
│  § 1 Intro   ░░     │  § 2.1 Data Model                  │
│  § 2 Arch   ███     │  ─────────────────                 │
│  > 2.1 Data ███<    │  Findings (5):                     │
│    2.2 API   ░░     │  * Logic: Schema contradicts...    │
│  § 3 Impl   ██      │  * Structure: Section too long     │
│  § 4 Eval    █      │  * Language: Passive voice in..    │
│  § 5 Concl   ░░     │  * Readability: Dense paragraph    │
│                     │  * Language: Inconsistent terms    │
│  ──────────────     │                                    │
│  [Run Review]       │  [Optimize This Section]           │
│  [Apply All]        │  [Apply Suggestion #1]             │
│                     │  [Show Diff Preview]               │
├─────────────────────┴───────────────────────────────────┤
│  Trace: Stage 1 ████████░░ Stage 2 ████░░ Verdict █      │
└──────────────────────────────────────────────────────────┘
```

#### Interactions

| Action | What happens |
|--------|-------------|
| **Click section in map** | Focus panel shows findings for that section only. Artifact text shown with inline highlights. |
| **"Optimize This Section"** | Re-runs all lenses on just this section with deeper prompts ("analyze in detail, suggest specific rewrites"). Results replace focus panel. |
| **"Apply Suggestion #N"** | Applies one suggestion to the artifact. Shows diff preview. Accept → updates artifact, re-runs affected lenses on changed section. |
| **"Apply All"** | Applies all prioritized suggestions. Shows full diff. Accept → saves as v2, triggers re-review. |
| **"Show Diff Preview"** | Before applying, shows word-level redline of what would change. |
| **"Run Review"** | (Re-)runs the full pipeline. Progress shows in trace bar. Document map updates live as lenses complete. |

#### Update Badges

After applying suggestions and re-reviewing, the document map shows update status per section:

```
§ 2.1 Data Model  ████ → ██  [✓ 3 resolved, 1 new]
```

Sections that improved show green delta. Sections with new findings show yellow alert.

#### Progressive Disclosure

- **Default view**: document map + prioritized suggestions (the "what to fix" view)
- **Click to expand**: per-lens detail, per-model raw outputs, trace details
- **Deep dive**: "Show all model outputs for this finding" — see exactly what each model said

### Presentation Agents (V2+)

The presentation layer can itself use LLM calls to adapt output:

| Agent | Input | Output | When |
|-------|-------|--------|------|
| **Summarizer** | Full findings set | Brief/standard/full views | `--brief` flag or Gradio toggle |
| **Audience adapter** | Full findings + audience flag | Jargon-adjusted, audience-appropriate version | `--audience executive` |
| **Section narrator** | Section text + findings | Inline annotations explaining findings in context | Gradio focus panel |
| **Diff explainer** | Before/after text | Natural-language explanation of what changed and why | After apply step |

These are separate LLM calls, not part of the review pipeline. They run on demand when the user requests a specific view.

**Brevity**: `--brief` / `--standard` (default) / `--full`
**Audience**: `--audience expert` / `general` / `executive`

### Redline / Tracked Changes (V2)

For fine-grained language work, show what changed:
- **Coarse view**: section-level summary ("removed 2 paragraphs from §3")
- **Fine view**: word-level redline (deletions in red strikethrough, additions in green underline)
- Implementation: Python `difflib` or custom word-level diff
- Report shows both views: summary for scanning, redline for close review

### V3: KB-Based Approach

The underlying output isn't a report — it's a **structured knowledge base** of the argument: points, importance weights, evidence strength, relationships between claims. Presentation formats (brief/expert/executive) are views over this KB.

```
review KB structure:
  points[]
    - claim: "..."
    - importance: 0.0-1.0
    - evidence_strength: 0.0-1.0
    - lens_source: [language, logic, ...]
    - suggestions[]
    - related_points: [id, id, ...]

  → brief view: filter importance > 0.8, top 3
  → expert view: full KB, field terminology
  → executive view: importance > 0.5, decisions framing
  → appendix: importance < 0.5, linked not inline
```

This makes the review reusable — the same KB can produce a report, a checklist, a diff, margin annotations, or feed into a subsequent review round with full context. Design deferred to V3.

---

## IV. Meta — Self-Improvement & Research

### SOTA-Informed Design Choices

Based on literature review — see [considerations doc](https://static.localhost/docs/plans/2026-02-27-draft-review-considerations.md) §1 for full citations with dates, systems tested, and ELO context.

1. **Structured envelope, freeform content** — JSON for metadata (severity, location, confidence) but free text for critique and suggestion. Avoids the -18% creativity penalty of fully structured output (Tam et al. 2025 [R11], tested on GPT-4o/Claude 3.5 Sonnet, ELO ~1342-1346).
2. **Evidence citations required** — each finding must quote the artifact text it refers to. Prevents hallucinated findings — 38% hallucination rate found in CriticGPT (OpenAI, Jun 2024 [R5], GPT-4 fine-tuned, ELO ~1324). Rate likely lower with frontier models but non-zero.
3. **Top N cap** — cross-lens synthesis asks for "top 10 most impactful" not "everything." Over-criticism buries signal (LLM-as-Judge survey, Nov 2024 [R8]).
4. **Anonymize in per-lens synthesis** — don't label model outputs by name during reduce. Prevents favoritism (llm-council pattern [T2]).
5. **Cross-provider critic panel** — one flagship per provider (Opus, Gemini, Grok, GPT) maximizes training-bias diversity. See considerations doc for panel composition rationale.
6. **V2: devil's advocate variant** — one model per lens gets adversarial system prompt. Counters convergence found in DREAM (Feb 2025 [R1]) and FREE-MAD (Sep 2025 [R6]).
7. **V2: confidence scores** — each finding carries a calibrated confidence, enabling triage (CISC, 2025 [R17]).

### Prompt Evolution

Systematic prompt improvement via: reference test corpus, ablation experiments, degradation→restoration tests, user acceptance tracking. See [prompt-evolution.md](https://static.localhost/learning/docs/prompt-evolution.md).

### Anti-Convergence Mechanisms

Adversarial stances (DREAM [R1]), anti-conformity rules (FREE-MAD [R6]), devil's advocate variants, anonymized model outputs. See [considerations doc](https://static.localhost/docs/plans/2026-02-27-draft-review-considerations.md) §1 for prioritized techniques.

### Confidence Calibration

Track lens accuracy over time — did applied suggestions actually help? Feed back into prompt refinement and severity weighting.

### Self-Reviewing Spec

Run the review system on its own design doc as a meta-test of coverage and usefulness.

---

## V. Implementation

### What's MVP vs Later

| Feature | MVP | V1.5 | V2 |
|---------|-----|------|-----|
| CLI `vario review artifact.md` | ✓ | ✓ | ✓ |
| 4 lenses × N models in parallel | ✓ | ✓ | ✓ |
| Staged execution (macro → micro) | ✓ | ✓ | ✓ |
| Two-level synthesis | ✓ | ✓ | ✓ |
| Work directory with raw outputs | ✓ | ✓ | ✓ |
| Static HTML report | ✓ | ✓ | ✓ |
| Document map (section heat map) | ✓ | ✓ | ✓ |
| Trace timeline | ✓ | ✓ | ✓ |
| `--lens` flag (choose lenses) | ✓ | ✓ | ✓ |
| `--section` flag (focus on section) | | ✓ | ✓ |
| Clickable map + severity filter (JS) | | ✓ | ✓ |
| Copy-to-clipboard apply commands | | ✓ | ✓ |
| Expandable trace nodes | | ✓ | ✓ |
| `--apply` (apply suggestions) | | ✓ | ✓ |
| `--apply N` (apply specific ones) | | ✓ | ✓ |
| `--continue` (re-review after edits) | | ✓ | ✓ |
| Revision history + score trajectory | | ✓ | ✓ |
| Gradio review tab | | | ✓ |
| Block-level "Optimize This Section" | | | ✓ |
| Interactive apply with diff preview | | | ✓ |
| Update badges (resolved/new) | | | ✓ |
| Presentation agents (summarizer, adapter) | | | ✓ |
| Brevity levels + audience adaptation | | | ✓ |
| Redline / tracked changes | | | ✓ |

### MVP Implementation Plan

| Step | What | Files |
|------|------|-------|
| 1 | `ReviewLens` dataclass + 4 MVP lenses | `vario/review.py` |
| 2 | Orchestrator: staged fan-out lenses × models | `vario/review.py` |
| 3 | Per-lens reduce (reuse `reduce.py`) | `vario/review.py` |
| 4 | Cross-lens synthesis prompt | `vario/review.py` |
| 5 | Work directory creation + JSON file saving | `vario/review.py` |
| 6 | HTML report: document map + trace + findings | `vario/review_report.py` |
| 7 | CLI subcommand `vario review` | `vario/cli.py` |
| 8 | Test with a real document | manual |

### Key Reuse

- **Parallel model execution**: `vario.api.run_prompt` — already handles N models in parallel
- **Reduce/synthesis**: `vario.reduce.stream_reduce` — debate strategy for per-lens consensus
- **HTML template**: `vario/strategies/report.py` CSS/layout patterns (dark theme, `<details>`, bar charts)
- **Card strip pattern**: `vario/cards.py` — clickable pills for section navigation (Tier B)
- **Existing prompts**: `linguistic_cleanup`, `logic_critic`, `readability_critic` from extract_prompts.yaml
- **Cost tracking**: Already in run_prompt result dicts

### New Code Estimate

**MVP**:
- `vario/review.py` — ~300 lines (orchestrator, lenses, staging, work dir)
- `vario/review_report.py` — ~350 lines (HTML report with document map + trace)
- `vario/cli.py` additions — ~40 lines (review subcommand with --lens flag)
- New prompt: `structure` lens — ~40 lines
- Total: ~730 lines

**V1.5** (adds ~200 lines):
- Inline `<script>` for clickable map, severity filter, clipboard commands
- `--section` and `--apply` CLI flags
- Revision history rendering

**V2** (adds ~500 lines):
- `vario/ui_review.py` — Gradio review tab
- Presentation agents (summarizer, adapter, diff explainer)
- Block-level optimize + apply-one flows

### Future Extensions

- **Prompt directory loading** — glob-load `vario/prompts/*.yaml` when file gets big
- **Code artifact support** — review.py code, with code-specific lenses (style, security, complexity)
- **Diagram-specific lens** — D2/Mermaid syntax check, label clarity, information density
- **Foldable report optimization** — progressive disclosure, reader mode toggle
- **Self-reviewing spec** — run the review system on its own design doc (meta-review)
