Draft

Understand what a document does, not just what it says — rhetorical role mapping, claim quality rating, style evaluation

The Problem

You read a 3,000-word essay. You can tell it's "pretty good." But can you say why? Can you point to the three strongest claims? Can you tell which evidence actually supports a claim vs. which is decorative? Can you spot where the author sneaks in an undefended assertion between two well-evidenced ones?

Most document tools tell you what a document says: keywords, topics, summaries. Draft tells you what the document does — what rhetorical moves the author makes, how strong the claims are, and where the writing falls short.

How It Works

Three analysis layers, each independent, each building on the role map as foundation:

Layer 1

Role Map

What does each chunk do?

Layer 2

Claim Rating

How novel & insightful?

Layer 3

Style

How well written?

Layer 4

Review

Multi-lens, multi-model

Layer 1: Role Map

Every sentence in a document serves a rhetorical function. An LLM reads the full text and classifies each contiguous chunk into one of 12 roles:

(Plus: definition, description, analogy, qualification, transition — 12 roles total.)

The Interactive Map

The output is a two-panel interactive view. Left: a sidebar of color-coded cards, one per segment. Right: the full document text with role-colored highlights. Click either side and the other scrolls to match.

Role	What it does	Example
claim	Assertion the author wants you to believe	"Nurse scheduling is the binding bottleneck for hospital AI"
evidence	Facts, data, statistics supporting a claim	"Revenue grew 15% YoY, driven by a 23% price increase"
example	Concrete illustration or case study	"Consider Norway's universal healthcare system..."
explanation	Making a concept or mechanism clear	"This works because the incentive structure aligns..."
concession	Acknowledging a counterpoint or limitation	"Critics rightly note that correlation is not causation"
appeal	Call to action, emotional persuasion	"We cannot afford to wait another decade"
context	Background, setup, prior work	"Since the 2008 financial crisis, regulators have..."

Role Map

context industry background

The semiconductor supply chain has grown...

claim concentration risk N:4 I:5 ★

Taiwan produces 92% of advanced chips...

evidence TSMC market share N:2 I:3

TSMC alone accounts for 56% of global...

concession diversification efforts

Intel and Samsung are investing heavily...

claim timeline challenge N:3 I:4

New fabs take 3-5 years and $20B+...

The semiconductor supply chain has grown into one of the most complex and geographically concentrated systems in the global economy. Understanding its structure is essential for anyone assessing technology risk.

Taiwan produces 92% of the world's most advanced semiconductors (<7nm), making it the single most critical chokepoint in the global technology supply chain.

TSMC alone accounts for 56% of global foundry revenue, with its most advanced nodes serving Apple, NVIDIA, AMD, and Qualcomm. Samsung holds roughly 12% at comparable nodes.

Intel and Samsung are investing heavily in domestic fabrication capacity, with Intel's Ohio fab and Samsung's Texas expansion both targeting sub-5nm by 2027.

But new leading-edge fabs take 3-5 years to build and cost $20 billion or more, meaning no amount of investment today can change the concentration risk for the rest of this decade.

The claim about Taiwan's 92% share gets N:4 I:5 ★ — high novelty (most people don't know the exact number) and high insight (it reframes "supply chain risk" as "single-island dependency"). The TSMC market share evidence gets N:2 I:3 — well-known fact, moderate insight.

Layer 2: Claim Quality Rating

After role extraction, the system can rate every claim, evidence segment, and example on two dimensions:

Dimension	Scale	What it captures
Novelty	1–5	Is this a fresh observation or restated conventional wisdom?
Insight	1–5	Does it reveal something non-obvious — a mechanism, implication, or reframing?

All claims are rated in a single batch prompt so the model can calibrate relative to each other within the document. A claim that's insightful compared to the document's other claims will score higher than the same claim rated in isolation.

Why markdown output, not JSON? Research (Tam et al., 2025) found that forcing structured JSON output causes an 18% drop in insight scores and 17% drop in novelty scores. The rating prompt uses natural language output, then parses scores with regex. The quality tradeoff is worth the parsing complexity.

Claims scoring 4+ on either dimension get a ★ star in the sidebar. High-rated claims are the diamonds — the observations worth quoting, building on, or challenging.

Layer 3: Style Evaluation

Evaluates prose quality against ~215 curated principles extracted from canonical style guides (Strunk & White, Zinsser, Pinker, etc.). The key architectural insight: evaluating 1–3 principles per LLM call dramatically outperforms dumping all criteria at once.

Step 1

Orient

Identify doc type & relevant principles

Step 2

Select

Pick 15–25 most applicable

Step 3

Batch Eval

Score 2–3 principles per call

Step 4

Improve

Rewrite addressing violations

Output: a per-violation report (principle, severity, specific quote, suggested fix) plus an improved version of the text with a changes summary.

Layer 4: Multi-Lens Review

The most comprehensive analysis: runs 2–4 evaluation lenses (structure, logic, language, readability), each with multiple models, then synthesizes findings into a unified report. Built on the vario multi-model infrastructure.

Architecture

Key Design Decisions

Reused Infrastructure

What It Reveals

Decision	Why
Core produces self-contained HTML strings	No UI framework dependency — works from CLI, API, tests, notebooks. The Gradio layer is a thin shell.
12-role taxonomy (not 5, not 50)	Covers the full range of rhetorical moves without requiring expert training to understand. Every educated reader knows what a "claim" and "evidence" are.
Batch claim rating (all claims in one call)	The model calibrates relative to the document — "novel compared to what the author already said." Per-claim calls lose this context.
Markdown output for ratings	Tam et al.: JSON output kills novelty/insight scores by 17–18%. Parse with regex, not schema.
Style eval: 2–3 principles per call	CheckEval/DeCE research: small batches massively outperform monolithic "evaluate everything at once" prompts.
Document loading in lib/ingest	PDF, DOCX, HTML, URL, paste — shared across all rivus projects, not duplicated in draft.

Component	Source	What Draft Reuses
LLM calls	`lib/llm`	call_llm with model aliases, prompt caching
Document loading	`lib/ingest`	PDF/DOCX/HTML/URL parsing with smart escalation
Multi-model review	`vario.review`	Parallel lens evaluation across models
UI components	`lib/gradio`	DocInput widget, emoji favicon, footer

Structure becomes visible. Many well-written documents have a hidden imbalance: lots of claims, little evidence. Or evidence that doesn't actually support the adjacent claim. The role map makes this immediately visible — you can see a red (claim) followed by another red (claim) with no blue (evidence) in between.

Craft and substance are independent axes. Style evaluation measures how well something is written. Claim rating measures how good the ideas are. A beautifully written platitude still scores N:1 I:1. An awkwardly phrased breakthrough still scores N:5 I:5. The system keeps these separate so you know which lever to pull: improve the writing, or improve the thinking.

The Problem

How It Works

Layer 1

Layer 2

Layer 3

Layer 4

Layer 1: Role Map

The Interactive Map

Layer 2: Claim Quality Rating

Layer 3: Style Evaluation

Step 1

Step 2

Step 3

Step 4

Layer 4: Multi-Lens Review

Architecture

Key Design Decisions

Reused Infrastructure

What It Reveals

What's Next