Draft Rhetorical Analysis — Demo Polish Report

2026-03-09 · Document: How to Do Great Work by Paul Graham · source · 11,517 words
Contents 1. Executive Summary 2. Role Map Results 3. Style Evaluation Results 4. Issues Found & Fixes Applied 5. Before/After Comparison 6. Gym Scores 7. Demo Script

1. Executive Summary

Draft analyzes documents along three dimensions: rhetorical role mapping (what each paragraph does), style evaluation (prose quality against 215 curated principles), and claims analysis (factual accuracy + evidence linking).

This session focused on making Draft demo-ready using Paul Graham's "How to Do Great Work" (11,517 words) as the test document. The main finding: the role extraction prompt was labeling 87% of segments as "explanation", missing the essay's many claims, analogies, and examples. After prompt engineering, the distribution improved dramatically to 44% claims / 22% explanation / 34% other roles.

Key Metrics
MetricBeforeAfterChange
Role diversity (non-explanation %)13%56%+43pp
Total segments extracted6273+11
Claim segments identified432+28
Explanation segments5416-38
Unique role types used610+4
Style eval score (PG excerpt)6.8/10baseline

2. Role Map Results

What the Role Map Does

The role mapAn interactive HTML view showing what each paragraph does in the argument: claim, evidence, example, explanation, etc. Left sidebar lists segments, right panel highlights the text. Click either side to sync. identifies the rhetorical function of each text chunk using a 12-role taxonomy. It produces a self-contained HTML file with two-way scroll sync between sidebar cards and highlighted document text.

Role Distribution (After Fix)

RoleCount%Description
claim3244%Assertions PG wants you to accept
🔍 explanation1622%Clarifying how/why mechanisms work
🔗 analogy68%Comparisons making abstract concrete
📖 definition57%Introducing key terms
⚠️ qualification45%Caveats and scope limits
💡 example34%Concrete illustrations
🤝 concession34%Acknowledging counterpoints
📢 appeal23%Calls to action
transition11%Section connectors
📋 context11%Background/framing

The distribution matches what you'd expect from PG's writing: predominantly claims (he states positions) with explanations supporting them, sprinkled with analogies and definitions. The previous 87% explanation rate was clearly wrong — PG's essay is argumentative, not expository.

Label Quality

Labels improved from vague references ("the four steps", "intersection has shape") to self-contained phrases ("four steps to great work", "great work techniques overlap across fields"). The prompt now requires labels to be understandable without reading the document, and to use proper punctuation when combining ideas.

Header Metadata

The map header now displays author name, source URL (clickable), word count, and segment count — previously only showed the filename and segment count.

3. Style Evaluation Results

How Style Evaluation Works

The style evaluationEvaluates prose quality by scoring a document against 215 curated principles from Strunk & White, Williams' Clarity, Pinker & Zinsser, Gopen & Swan, and Orwell. Uses few-principles-at-a-time batch evaluation (2-3 per LLM call) for accuracy. pipeline: orient (triage what kind of writing this is) → select (pick relevant principles) → batch evaluate (2-3 principles per LLM call) → aggregate (category scores, top issues, top strengths).

PG "Great Work" Excerpt — Style Evaluation (Haiku, 500 words)
6.8
111 principles evaluated, 102 skipped by orientation
128 violations · 252 exemplars · 91.8s latency
CategoryScoreViolationsExemplars
Strunk & White6.2/1058
Williams' Clarity6.3/1069114
Other (Pinker/Zinsser, Gopen/Swan, Orwell)7.5/1054130

Top Issues

SeverityPrincipleIssue
criticalNo comma spliceComma splice between independent clauses: "When you're young you don't know what you're good at or what..."
criticalConsistent topic stringsTopic shifts: 'you' → 'some kinds of work' → 'some people'/'most' within one paragraph
criticalOne story per unitParagraph combines three stories: (1) young people lack self-knowledge, (2) some work doesn't exist yet, (3) discovery through working
criticalEliminate zombie nounsNominalizations "intersection" and "shape" obscure the underlying action

Top Strengths

PrincipleExample
Avoid the not-un- formationUses direct "difficult" rather than hedging with "not straightforward"
Topic position establishes contextOpens with clear framing ("The first step") before the main point
Direct assertionsMakes concrete claims ("too conservative") without double negatives

Assessment: The evaluation correctly identifies PG's informal conversational style — it catches genuine issues (comma splices, topic shifts in long paragraphs) while recognizing his strengths (direct assertions, clear framing). The 6.8/10 for PG seems calibrated for a conversational essay against formal style guides. The violations are specific with exact quotes, and the strengths highlight what makes PG's writing effective.

4. Issues Found & Fixes Applied

IssueSeverityExampleFix Applied
Role skew: 87% explanation critical "You should follow your interests" labeled as explanation instead of claim Added disambiguation section to prompt differentiating claim/explanation/evidence with concrete tests
Distribution monotony critical Only 6 of 12 role types used Added distribution check: "if >50% of segments share one role, reconsider"
Labels not self-contained moderate "intersection has shape" — intersection of what? Updated prompt: labels must be understandable without the document, with good vs bad examples
Labels lack punctuation moderate "be prolific start lots of small things" — no comma/semicolon Added punctuation guidance: "use commas, semicolons when combining ideas"
No author/source metadata moderate Header showed only filename and segment count Added author and source_url params to render_role_map() + word count
Markdown artifacts in text moderate Table borders | --- | --- | from markdownify in document pane Added clean_text_for_display() that strips table borders, pipe chars, excess whitespace

Files Modified

FileChange
draft/core/roles.pyPrompt: disambiguation section, distribution check, self-contained labels with punctuation
draft/core/mapper.pyAdded author, source_url params to header; clean_text_for_display()
draft/cli.pyCLI map: added --author flag, text cleaning, source URL inference

5. Before/After Comparison

Role Distribution — Full Essay (11,517 words)

Before (62 segments)
54 explanation  (87%)
 4 claim        (6%)
 1 evidence     (2%)
 1 context      (2%)
 1 concession   (2%)
 1 appeal       (2%)

Only 6 of 12 role types used. Nearly everything tagged "explanation" regardless of whether the text was asserting a position, giving an analogy, or defining a term.

After (73 segments)
32 claim        (44%)
16 explanation  (22%)
 6 analogy      (8%)
 5 definition   (7%)
 4 qualification(5%)
 3 example      (4%)
 3 concession   (4%)
 2 appeal       (3%)
 1 transition   (1%)
 1 context      (1%)

10 of 12 role types used. Claims correctly dominate (PG is argumentative), with explanations supporting them. Analogies, qualifications, and concessions properly identified.

Label Quality

Before
After

Header Metadata

Before

pg-great-work · 62 segments

After

pg-great-work · Paul Graham · paulgraham.com/greatwork.html · 11,638 words · 73 segments

6. Gym Scores

Three gyms measure extraction/evaluation quality across models. Each runs 3 corpus documents per model, comparing against reference extractions (role gym, claims gym) or meta-judge scoring (style gym).

Role Extraction Gym

Measures: role accuracy (40%), coverage (25%), boundary quality (20%), distribution realism (15%)

ModelOverallAccuracyCoverageBoundaryDistributionAvg Segments
gemini-flash77.067.4100.064.381.012
sonnet73.564.2100.057.075.916
grok-fast71.062.1100.049.375.216
haiku67.758.4100.037.378.721

Best single run: sonnet on narrative_report (82.9)

Style Evaluation Gym

Measures: specificity (30%), calibration (25%), coverage (25%), actionability (20%). Meta-judged by OpusClaude Opus — the most capable model, used here as the meta-evaluator for gym judging..

ModelOverallAvg ViolationsAvg Latency
sonnet77.3354.7207.7s
grok-fast70.0129.058.8s
haiku67.3234.081.6s
SubscoreAverage
Specificity72.8
Actionability72.2
Coverage71.8
Calibration69.4

Best single run: sonnet on poor_academic (88.0)

Claims Analysis Gym

Measures: claim detection recall, precision, evidence linking accuracy.

ModelOverallDetectionPrecisionEvidence Linking
sonnet77.2100.067.553.5
grok-fast75.991.778.355.8
haiku63.688.951.937.8
gemini-flash59.766.761.144.4

Cross-Gym Summary

ModelRolesStyleClaimsAvgBest For
sonnet73.577.377.276.0Style + Claims (quality over speed)
gemini-flash77.059.768.4Role extraction (fast + accurate)
grok-fast71.070.075.972.3Balanced (good claims, fast)
haiku67.767.363.666.2Budget option

Recommendation: Use gemini-flash for role extraction (best accuracy, fastest). Use sonnet for style evaluation and claims (highest quality).

7. Demo Script (3 minutes)

Suggested Walkthrough

Setup (30s)

"Draft analyzes documents along three dimensions: what each paragraph does (rhetorical role), how well it's written (style), and whether claims are supported (evidence). Let me show you with Paul Graham's essay."

Role Map (60s)

draft map https://paulgraham.com/greatwork.html --author "Paul Graham"

Show the two-panel view. Click a claim card in the sidebar → document scrolls to it with flash animation. Scroll through the document → sidebar follows. Point out the role distribution: "44% claims — PG is argumentative, and the tool catches that." Click an analogy to show it identified PG's metaphorical reasoning.

Style Evaluation (60s)

draft style /tmp/pg-excerpt.md

Show the score (6.8/10) and explain: "PG's informal style intentionally breaks some formal rules — comma splices, topic shifts — but the tool correctly identifies his strengths: direct assertions, clear framing, no hedging." Read one specific violation with its fix suggestion to show actionability.

Claims + Rating (30s)

draft claims /tmp/pg-great-work.md

Show claims with novelty/insight ratings. Point out: "N:4 I:5 means this is a genuinely novel insight — the tool filters out process descriptions and highlights the substantive claims."

Close (15s)

"This pipeline works on any document — earnings calls, pitch memos, research papers, blog posts. The gym system continuously measures extraction quality across models."

Glossary

Role Map
Interactive HTML visualization showing the rhetorical function (claim, evidence, example, etc.) of each text segment. Two-panel layout with scroll sync.
Style Evaluation
Automated prose quality assessment using 215 curated principles from 5 canonical style guides. Uses few-principles-at-a-time batch evaluation for accuracy.
Opus
Claude Opus — Anthropic's most capable model, used as the meta-evaluator for gym judging.