Claim Extraction Gym: assertions

opus leads at 88 avg, gemini-pro trails at 0. Overall avg 64 across 120 candidates, 6 models.

About This Gym

Task: Extract structured claims from documents — entities, atoms (raw facts), assertions (interpretations), and relationships (influence links between claims). Models receive a document and a prompt variant, and must return structured JSON with typed, sourced claims.

Evaluation: An Opus judge scores each candidate (0–100) against a human-curated reference extraction. The judge compares the candidate’s output field-by-field against the reference, checking whether entities were found, facts captured, interpretations justified, and relationships mapped. Six weighted criteria:

CriterionWeightWhat it measures
Entity coverage15%Did the model find all significant entities?
Atom completeness20%Were key facts and data points captured?
Assertion quality20%Are interpretations well-formed and distinct from raw facts?
Relationship accuracy20%Are influence links correctly identified with appropriate weights?
Precision15%No hallucinated claims, spurious entities, or invented relationships
Source grounding10%Every claim backed by an accurate quote from the document

What a high score means: The extraction is thorough (catches what the reference caught), precise (doesn’t invent things), and well-structured (assertions are interpretive, atoms are factual, relationships have reasonable influence weights). A score of 80+ means the extraction is near-reference quality.

Matrix: 6 models × 1 prompt variants × 20 documents = 120 candidates total.

Score Matrix (model × prompt)

Model \ Promptdefault
gemini-pro0.0
gpt81.7
grok-fast56.5
haiku77.3
opus88.5
sonnet81.2

Extraction Volume

Model Avg Claims Avg Entities Avg Relationships Avg Atoms Avg Assertions Avg Score
gemini-pro 0.0 0.0 0.0 0.0 0.0 0.0
gpt 96.8 21.6 7.4 57.0 32.1 81.7
grok-fast 30.9 9.6 2.5 18.2 10.2 56.5
haiku 58.4 9.4 1.4 34.0 22.8 77.3
opus 55.8 10.9 6.2 34.8 14.8 88.5
sonnet 41.8 9.9 4.2 26.6 11.1 81.2

Category Breakdown (avg subscores per model)

Modelassertion_qualityatom_completenessentity_coverageprecisionrelationship_accuracysource_grounding
gemini-pro0.00.00.00.00.00.0
gpt79.989.084.178.874.090.2
grok-fast48.957.169.576.238.277.0
haiku77.982.476.379.771.584.3
opus89.290.793.689.784.392.0
sonnet82.883.382.486.773.790.0

Failure Taxonomy

Candidates scoring <20 are auto-classified by failure type.

Failure TypeCount% of Scored
Task failure (wrong extraction)2017%

Scored Examples

Best vs worst extraction on document 3417 (score spread: 93 points). Shows what the judge rewards and penalizes.

Best: opus:default — 93
Judge says: The candidate extraction is highly comprehensive and closely mirrors the reference, with all key entities, atoms, assertions, and relationships captured. It includes a few additional useful claims (c17 on non-recourse leverage, c28 on receivable mechanics, c33 on long-term thesis) that add value. Relationships are well-structured with appropriate influence weights. Source quotes are accurate throughout.
entity_coverage
100.0
atom_completeness
95.0
assertion_quality
93.0
relationship_accuracy
90.0
precision
93.0
source_grounding
95.0
Extraction preview (15333 chars)

```json
{
  "entities": [
    {"id": "e1", "name": "Heliad Equity Partners", "type": "company"},
    {"id": "e2", "name": "Andreas Lange", "type": "person"},
    {"id": "e3", "name": "GreenPark Investments", "type": "company"},
    {"id": "e4", "name": "German small-cap market", "type": "market"}
  ],
  "claims": [
    {
      "id": "c1",
      "claim_type": "atom",
      "text": "Heliad Equity Partners is a publicly traded investment firm in Germany with a market cap of 37M Euro",
      "confidence": 0.95,
      "source_quote": "Heliad Equity Partners is a publicly traded investment firm in Germany with a market cap of 37M Euro.",
      "subject": "e1"
    },
    {
      "id": "c2",
      "claim_type": "atom",
      "text": "Heliad is run by Andreas Lange, a mid-30s executive",
      "confidence": 0.95,
      "source_quote": "Heliad is run by Andreas Lange, a mid 30s executive who is considered to be a rising star in the German private equity community",
      "subject": "e2"
    },
    {
      "id": "c3",
      "claim_type": "assertion",
      "text": "Andreas Lange is considered a rising star in the German private equity community",
      "confidence": 0.75,
      "direction": "bullish",
      "source_quote": "Andreas Lange, a mid 30s executive who is considered to be a rising star in the German private equity community",
      "subject": "e2"
    },
    {
      "id": "c4",
      "claim_type": "atom",
      "text": "Heliad started making investments in 2006",
      "confidence": 0.95,
      "source_quote": "Heliad started making investments in 2006, so given the typical private equity 3-5 year time frame there have been few exits to date.",
      "subject": "e1"
    },
    {
      "id": "c5",
      "claim_type": "atom",
      "text": "Management plans to exit almost all current investments in 2009 and 2010",
      "confidence": 0.90,
      "source_quote": "management plans to exit almost all of their current investments in 2009 and 2010",
      "subject": "e1"
Worst: gemini-pro:default — 0
Judge says: The candidate extraction failed entirely due to an API error. No extraction was produced, so all criteria score 0.
entity_coverage
0.0
atom_completeness
0.0
assertion_quality
0.0
relationship_accuracy
0.0
precision
0.0
source_grounding
0.0
Extraction preview (323 chars)
ERROR: [NotFoundError] litellm.NotFoundError: GeminiException - {
  "error": {
    "code": 404,
    "message": "models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.",
    "status": "NOT_FOUND"
  }
}

Detail Cards (top 10 + bottom 5)

opus default 3417 93
The candidate extraction is highly comprehensive and closely mirrors the reference, with all key entities, atoms, assertions, and relationships captured. It includes a few additional useful claims (c17 on non-recourse leverage, c28 on receivable mechanics, c33 on long-term thesis) that add value. Relationships are well-structured with appropriate influence weights. Source quotes are accurate throughout.
Claims: 39 · Relationships: 5 · Entities: 4 · Atoms: 20 · Assertions: 14
entity_coverage
100.0
atom_completeness
95.0
assertion_quality
93.0
relationship_accuracy
90.0
precision
93.0
source_grounding
95.0
Content preview (15333 chars)

```json
{
  "entities": [
    {"id": "e1", "name": "Heliad Equity Partners", "type": "company"},
    {"id": "e2", "name": "Andreas Lange", "type": "person"},
    {"id": "e3", "name": "GreenPark Investments", "type": "company"},
    {"id": "e4", "name": "German small-cap market", "type": "market"}
  ],
  "claims": [
    {
      "id": "c1",
      "claim_type": "atom",
      "text": "Heliad Equity Partners is a publicly traded investment firm in Germany with a market cap of 37M Euro",
      "confidence": 0.95,
      "source_quote": "Heliad Equity Partners is a publicly traded investment firm in Germany with a market cap of 37M Euro.",
      "subject": "e1"
    },
    {
      "id": "c2",
      "claim_type": "atom",
      "text": "Heliad is run by Andreas Lange, a mid-30s executive",
      "confidence": 0.95,
      "source_quote": "Heliad is run by Andreas Lange, a mid 30s executive who is considered to be a rising star in the German private equity community",
      "subject": "e2"
    },
    {
      "id": "c3",
      "claim_type": "assertion",
      "text": "Andreas Lange is considered a rising star in the German private equity community",
      "confidence": 0.75,
      "direction": "bullish",
      "source_quote": "Andreas Lange, a mid 30s executive who is considered to be a rising star in the German private equity community",
      "subject": "e2"
    },
    {
      "id": "c4",
      "claim_type": "atom",
      "text": "Heliad started making investments in 2006",
      "confidence": 0.95,
      "source_quote": "Heliad started making investments in 2006, so given the typical private equity 3-5 year time frame there have been few exits to date.",
      "subject": "e1"
    },
    {
      "id": "c5",
      "claim_type": "atom",
      "text": "Management plans to exit almost all current investments in 2009 and 2010",
      "confidence": 0.90,
      "source_quote": "management plans to exit almost all of their current investments in 2009 and 2010",
      "subject": "e1"
    },
    {
      "id": "c6",
      "claim_type": "atom",
      "text": "The three exits to date have generated IRRs in excess of 25%",
      "confidence": 0.90,
      "source_quote": "the three exits to date have generated IRRs in excess of 25%",
      "subject": "e1"
    },
    {
      "id": "c7",
      "claim_type": "atom",
      "text": "German accounting regulations preclude Heliad from marking much of its portfolio up to fair market value given the recent vintage of investments",
      "confidence": 0.85,
      "source_quote": "German accounting regulations preclude Heliad from marking much of its portfolio up to fair market value given the recent vintage of the investments.",
      "subject": "e1"
    },
    {
      "id": "c8",
      "claim_type": "atom",
      "text": "Heliad's stated net asset value is 1.20 Euro per share",
      "confidence": 0.95,
      "source_quote": "the company's stated net asset value is 1.20 Euro (current share price 0.53 Euro)",
      "subject": "e1"
opus default 2792469495 91
The candidate extraction is very closely aligned with the reference. It identifies all 11 entities, captures nearly all key atoms and assertions, has well-formed relationships, and maintains strong source grounding. Minor differences include slightly different evidence linkages and a few atoms being split differently, but overall quality is excellent.
Claims: 60 · Relationships: 8 · Entities: 11 · Atoms: 39 · Assertions: 13
entity_coverage
100.0
atom_completeness
93.0
assertion_quality
92.0
relationship_accuracy
88.0
precision
92.0
source_grounding
90.0
Content preview (21111 chars)

```json
{
  "entities": [
    {"id": "e1", "name": "Trailer Bridge (TRBR)", "type": "company"},
    {"id": "e2", "name": "John McCown", "type": "person"},
    {"id": "e3", "name": "Jones Act", "type": "regulation"},
    {"id": "e4", "name": "Puerto Rico market", "type": "market"},
    {"id": "e5", "name": "Crowley Lines", "type": "company"},
    {"id": "e6", "name": "SeaStar", "type": "company"},
    {"id": "e7", "name": "Horizon Lines (HRZ)", "type": "company"},
    {"id": "e8", "name": "Navieras", "type": "company"},
    {"id": "e9", "name": "Malcolm McLean", "type": "person"},
    {"id": "e10", "name": "Bill Gottimer", "type": "person"},
    {"id": "e11", "name": "McLean family/estate", "type": "entity"}
  ],
  "claims": [
    {
      "id": "c1",
      "claim_type": "atom",
      "text": "TRBR stock price is approximately $8.73-$9.00",
      "confidence": 0.95,
      "source_quote": "With the stock at $9, I believe that there is 70%+ upside based on current operations",
      "subject": "e1"
    },
    {
      "id": "c2",
      "claim_type": "atom",
      "text": "TRBR has 12.3 million fully diluted shares outstanding",
      "confidence": 0.95,
      "source_quote": "TRBR's market cap is 12.3 million FD shares at $8.73 = $108 million",
      "subject": "e1"
    },
    {
      "id": "c3",
      "claim_type": "atom",
      "text": "TRBR market cap is approximately $108 million",
      "confidence": 0.95,
      "source_quote": "TRBR's market cap is 12.3 million FD shares at $8.73 = $108 million",
      "subject": "e1"
    },
    {
      "id": "c4",
      "claim_type": "atom",
      "text": "TRBR total debt is $109 million",
      "confidence": 0.95,
      "source_quote": "Total Debt is $109 million",
      "subject": "e1"
    },
    {
      "id": "c5",
      "claim_type": "atom",
      "text": "TRBR has $12 million in cash",
      "confidence": 0.95,
      "source_quote": "Cash is $12 million",
      "subject": "e1"
    },
    {
      "id": "c6",
      "claim_type": "atom",
      "text": "TRBR has $20 million in tax assets, with PV estimated at $15 million at 10%",
      "confidence": 0.90,
      "source_quote": "Tax assets $20 million, my PV = 15 million at 10%",
      "subject": "e1"
    },
    {
      "id": "c7",
      "claim_type": "atom",
      "text": "TRBR enterprise value is approximately $190 million",
      "confidence": 0.90,
      "source_quote": "So EV = 108+109-12-15 = $190 million",
      "subject": "e1"
    },
    {
      "id": "c8",
      "claim_type": "atom",
      "text": "TRBR 2005E EBITDA is $22.3 million, 2006E EBITDA is $25.4 million",
      "confidence": 0.85,
      "source_quote": "Year 2005 2006 EBITDA 22.3 25.4",
      "subject": "e1"
    },
    {
      "id": "c9",
      "claim_type": "atom",
      "text": "TRBR 2005E recurring FCF is $8.5 million, 2006E is $10.5 million",
      "confidence": 0.85,
      "source_quote": "Recurring FCF 8.5 10.5",
      "subject": "e1"
    },
    {
      "id": "c10",
      "claim_type"
opus default 7513866327 91
The candidate extraction is very closely aligned with the reference. All entities are identified, atoms are comprehensive and accurate, assertions are well-formed and interpretive, relationships are logically sound with appropriate influence weights, and all claims are grounded in actual source quotes. Minor differences include the candidate missing the 'no catalyst' as a standalone atom (instead only as an assertion) and slightly different relationship structuring, but overall quality is excellent.
Claims: 20 · Relationships: 4 · Entities: 4 · Atoms: 12 · Assertions: 4
entity_coverage
100.0
atom_completeness
90.0
assertion_quality
92.0
relationship_accuracy
88.0
precision
95.0
source_grounding
95.0
Content preview (6785 chars)

```json
{
  "entities": [
    {"id": "e1", "name": "Mueller Water Products", "type": "company"},
    {"id": "e2", "name": "Series A shares", "type": "security"},
    {"id": "e3", "name": "Series B shares", "type": "security"},
    {"id": "e4", "name": "Walter Industries", "type": "company"}
  ],
  "claims": [
    {
      "id": "c1",
      "claim_type": "atom",
      "text": "Mueller has 30 million Series A shares outstanding",
      "confidence": 0.95,
      "source_quote": "30 million Series A shares (the \"A shares\")",
      "subject": "e2"
    },
    {
      "id": "c2",
      "claim_type": "atom",
      "text": "Mueller Series A shares closed at $8.98/share",
      "confidence": 0.95,
      "source_quote": "30 million Series A shares (the \"A shares\"), closed at $8.98/share",
      "subject": "e2"
    },
    {
      "id": "c3",
      "claim_type": "atom",
      "text": "Mueller has 86 million Series B shares outstanding",
      "confidence": 0.95,
      "source_quote": "86 million Series B shares (the \"B shares\")",
      "subject": "e3"
    },
    {
      "id": "c4",
      "claim_type": "atom",
      "text": "Mueller Series B shares closed at $6.50/share",
      "confidence": 0.95,
      "source_quote": "86 million Series B shares (the \"B shares\"), closed at $6.50/share",
      "subject": "e3"
    },
    {
      "id": "c5",
      "claim_type": "atom",
      "text": "A shares and B shares have economically equal rights",
      "confidence": 0.95,
      "source_quote": "Mueller's A shares and B shares have economically equal rights",
      "subject": "e1"
    },
    {
      "id": "c6",
      "claim_type": "atom",
      "text": "B shares have 8 votes per share while A shares have 1 vote per share",
      "confidence": 0.95,
      "source_quote": "B shares have 8 votes per share while A shares have 1 vote per share",
      "subject": "e3"
    },
    {
      "id": "c7",
      "claim_type": "atom",
      "text": "Mueller's A/B share structure resulted from a two-step, tax-free spin-off in 2006 by Walter Industries",
      "confidence": 0.95,
      "source_quote": "Mueller's A/B share structure resulted from a two-step, tax-free spin-off in 2006 by Walter Industries",
      "subject": "e1"
    },
    {
      "id": "c8",
      "claim_type": "atom",
      "text": "B shares are trading at 72% of the value of A shares",
      "confidence": 0.95,
      "source_quote": "it is the B shares that are currently trading at only 72% of the value of the A shares",
      "subject": "e3"
    },
    {
      "id": "c9",
      "claim_type": "atom",
      "text": "The spread between A and B shares is 36.7%",
      "confidence": 0.95,
      "source_quote": "I'm recommending a relative arb position in Mueller Water Products (\"Mueller\") to capture a spread of 36.7%",
      "subject": "e1"
    },
    {
      "id": "c10",
      "claim_type": "atom",
      "text": "This is the deepest discount on B shares since March 2007",
      "confidence": 0.90,
      "source_q
opus default 2658326550 91
The candidate extraction is very closely aligned with the reference, capturing all significant entities, nearly all atoms and assertions, and well-formed relationships. Minor differences include combining some reference atoms (c13/c14 into c13/c14 with slightly different splits), adding a useful assertion (c33 about 'found religion'), and slight variations in relationship structure. The candidate is comprehensive, precise, and well-grounded.
Claims: 39 · Relationships: 6 · Entities: 8 · Atoms: 20 · Assertions: 13
entity_coverage
98.0
atom_completeness
95.0
assertion_quality
92.0
relationship_accuracy
85.0
precision
93.0
source_grounding
95.0
Content preview (14676 chars)

```json
{
  "entities": [
    {"id": "e1", "name": "Campus Crest Communities Inc.", "type": "company"},
    {"id": "e2", "name": "CCG", "type": "ticker"},
    {"id": "e3", "name": "Copper Beech/Cooper Beach", "type": "asset_portfolio"},
    {"id": "e4", "name": "EVO", "type": "product_concept"},
    {"id": "e5", "name": "Aaron Halfacre", "type": "person"},
    {"id": "e6", "name": "Cole Real Estate Investments", "type": "company"},
    {"id": "e7", "name": "REIT investors", "type": "concept"},
    {"id": "e8", "name": "Student housing sector", "type": "sector"}
  ],
  "claims": [
    {
      "id": "c1",
      "claim_type": "atom",
      "text": "CCG stock price is $6.25 per share",
      "confidence": 0.95,
      "source_quote": "At 6.25 a share, the stock trades at a 22% discount to tangible book value of $7.98 and a dividend yield of 10.6%.",
      "subject": "e1"
    },
    {
      "id": "c2",
      "claim_type": "atom",
      "text": "CCG trades at a 22% discount to tangible book value of $7.98",
      "confidence": 0.95,
      "source_quote": "At 6.25 a share, the stock trades at a 22% discount to tangible book value of $7.98 and a dividend yield of 10.6%.",
      "subject": "e1"
    },
    {
      "id": "c3",
      "claim_type": "atom",
      "text": "CCG has a dividend yield of 10.6%",
      "confidence": 0.95,
      "source_quote": "At 6.25 a share, the stock trades at a 22% discount to tangible book value of $7.98 and a dividend yield of 10.6%.",
      "subject": "e1"
    },
    {
      "id": "c4",
      "claim_type": "atom",
      "text": "CCG operates 35,801 beds currently (14,920 wholly owned, 5,148 JV and 13,177 Copper Beach)",
      "confidence": 0.95,
      "source_quote": "While CCG operates 35,801 beds currently (14,920 wholly owned, 5,148 JV and 13,177 Copper Beach)",
      "subject": "e1"
    },
    {
      "id": "c5",
      "claim_type": "atom",
      "text": "During 2014, CCG is expected to deliver 7,455 additional beds, a 21% increase",
      "confidence": 0.95,
      "source_quote": "during 2014, they are expected to deliver 7,455 additional beds (a 21% increase)—which will drive rents and cash flow.",
      "subject": "e1"
    },
    {
      "id": "c6",
      "claim_type": "atom",
      "text": "CCG properties have an average distance to campus of 0.6 miles",
      "confidence": 0.95,
      "source_quote": "CCG's properties haven an average distance to the campus that they are built for of .6 miles.",
      "subject": "e1"
    },
    {
      "id": "c7",
      "claim_type": "atom",
      "text": "CCG has $1.273 billion of assets, mostly investments in property",
      "confidence": 0.95,
      "source_quote": "There is $1.273b of assets, mostly made up of investments in property assets.",
      "subject": "e1"
    },
    {
      "id": "c8",
      "claim_type": "atom",
      "text": "CCG has $603 million of liabilities, mostly mortgages and debt",
      "confidence": 0.95,
      "source_quote": "This is offset by $603m of 
opus default 128923 90
The candidate extraction is highly comprehensive and closely mirrors the reference. It captures nearly all entities, atoms, assertions, and relationships with accurate source quotes. Minor differences include some additional useful claims (e.g., VIP loyalty program as a relationship driver, KKR motivation relationship) and slight variations in how certain claims are categorized, but overall quality is excellent.
Claims: 65 · Relationships: 10 · Entities: 11 · Atoms: 40 · Assertions: 15
entity_coverage
95.0
atom_completeness
92.0
assertion_quality
90.0
relationship_accuracy
88.0
precision
92.0
source_grounding
93.0
Content preview (25915 chars)

```json
{
  "entities": [
    {"id": "e1", "name": "Pets at Home", "type": "company"},
    {"id": "e2", "name": "PetSmart (PETM)", "type": "company"},
    {"id": "e3", "name": "KKR", "type": "company"},
    {"id": "e4", "name": "Bridgepoint", "type": "company"},
    {"id": "e5", "name": "Petco", "type": "company"},
    {"id": "e6", "name": "Wainwright", "type": "brand"},
    {"id": "e7", "name": "WH Smith", "type": "company"},
    {"id": "e8", "name": "Home Retail Group", "type": "company"},
    {"id": "e9", "name": "Ocado", "type": "company"},
    {"id": "e10", "name": "UK pet care market", "type": "market"},
    {"id": "e11", "name": "London Stock Exchange", "type": "exchange"}
  ],
  "claims": [
    {
      "id": "c1",
      "claim_type": "atom",
      "text": "Pets at Home has traded off ~30% since going public in March 2014",
      "confidence": 0.95,
      "source_quote": "Pets at Home (\"PETS LN\" or \"PETS\") has traded off ~30% since going public in March 2014",
      "subject": "e1"
    },
    {
      "id": "c2",
      "claim_type": "atom",
      "text": "PETS was priced at 245p per share at IPO and has since traded to ~170p per share",
      "confidence": 0.95,
      "source_quote": "At the time of its IPO, PETS was priced at 245p per share but has since traded to ~170p per share despite posting strong maiden results",
      "subject": "e1"
    },
    {
      "id": "c3",
      "claim_type": "atom",
      "text": "PETS trades at ~11x CY15 earnings estimate",
      "confidence": 0.90,
      "source_quote": "PETS now trades at ~11x our CY15 earnings estimate, making it one of the cheapest UK specialty retailers",
      "subject": "e1"
    },
    {
      "id": "c4",
      "claim_type": "atom",
      "text": "PETS has over 10 years of consistent positive same-store sales history",
      "confidence": 0.95,
      "source_quote": "Despite being an exceptionally high quality retailer with over 10 years of consistent positive SSS history",
      "subject": "e1"
    },
    {
      "id": "c5",
      "claim_type": "atom",
      "text": "PETS has 386 stores as of September 2014 with average retail space of ~6,600 sq ft per store",
      "confidence": 0.95,
      "source_quote": "As of September 2014, Pets at Home was the clear market leader in terms of store numbers with 386 stores and an average retail space of ~6,600 square feet per store",
      "subject": "e1"
    },
    {
      "id": "c6",
      "claim_type": "atom",
      "text": "The UK pet care market was worth ~£5.4 billion in 2012",
      "confidence": 0.90,
      "source_quote": "The company estimates that the entire market was worth ~£5.4 billion in 2012",
      "subject": "e10"
    },
    {
      "id": "c7",
      "claim_type": "atom",
      "text": "PETS had 12% overall market share, 12.6% pet food share, 32.3% non-food products share, and 5.9% veterinary services share as of 2012",
      "confidence": 0.90,
      "source_quote": "The company's estimated shares in the UK pet food, n
opus default 5159562350 90
The candidate extraction is highly comprehensive and closely mirrors the reference. It captures all significant entities, nearly all atoms and assertions, and includes well-structured relationships. Minor differences include slightly different evidence linkages and a few additional atoms (e.g., c24 on ROE averaging >9.5% for four years) that add value. The candidate also includes a few extra assertions (e.g., c46 on onerous agreement, c58 on Allen Harim masking) that are well-grounded. Relationships are slightly less detailed than the reference (e.g., c61 vs reference c59 missing the self-supply feasibility slot) but are accurate.
Claims: 67 · Relationships: 7 · Entities: 13 · Atoms: 41 · Assertions: 19
entity_coverage
92.0
atom_completeness
93.0
assertion_quality
90.0
relationship_accuracy
85.0
precision
92.0
source_grounding
93.0
Content preview (26316 chars)

```json
{
  "entities": [
    {"id": "e1", "name": "ARTNA (Artesian Resources)", "type": "company"},
    {"id": "e2", "name": "MSEX (Middlesex Water Company)", "type": "company"},
    {"id": "e3", "name": "YORW (York Water Company)", "type": "company"},
    {"id": "e4", "name": "Chester Water Authority", "type": "company"},
    {"id": "e5", "name": "Allen Harim", "type": "company"},
    {"id": "e6", "name": "Taylor family", "type": "entity"},
    {"id": "e7", "name": "Dian Taylor", "type": "person"},
    {"id": "e8", "name": "Nicki Taylor", "type": "person"},
    {"id": "e9", "name": "DEPSC (Delaware Public Service Commission)", "type": "entity"},
    {"id": "e10", "name": "New Castle County", "type": "geography"},
    {"id": "e11", "name": "Sussex County", "type": "geography"},
    {"id": "e12", "name": "Tidewater Utilities", "type": "company"},
    {"id": "e13", "name": "ARTNB", "type": "security"}
  ],
  "claims": [
    {
      "id": "c1",
      "claim_type": "atom",
      "text": "ARTNA is a small cap water & wastewater utility company in Delaware servicing New Castle and Sussex counties",
      "confidence": 0.98,
      "source_quote": "ARTNA is a small cap water & wastewater utility company in Delaware servicing New Castle and Sussex counties",
      "subject": "e1"
    },
    {
      "id": "c2",
      "claim_type": "atom",
      "text": "Since 2014, ARTNA has increased corporate ROE from 7.7% to 9.5%",
      "confidence": 0.95,
      "source_quote": "Since 2014, ARTNA has maintained a conservative balance sheet, increased corporate ROE from 7.7% to 9.5%",
      "subject": "e1"
    },
    {
      "id": "c3",
      "claim_type": "atom",
      "text": "ARTNA has increased TBV/share at a 4% CAGR since 2014",
      "confidence": 0.95,
      "source_quote": "increased TBV/share at a 4% CAGR",
      "subject": "e1"
    },
    {
      "id": "c4",
      "claim_type": "atom",
      "text": "ARTNA has grown EPS at an 8% CAGR since 2014",
      "confidence": 0.95,
      "source_quote": "grown EPS at an 8% CAGR while paying a dividend equal to ~60% of earnings",
      "subject": "e1"
    },
    {
      "id": "c5",
      "claim_type": "atom",
      "text": "ARTNA pays a dividend equal to ~60% of earnings",
      "confidence": 0.95,
      "source_quote": "paying a dividend equal to ~60% of earnings",
      "subject": "e1"
    },
    {
      "id": "c6",
      "claim_type": "atom",
      "text": "ARTNA trades at ~20x LTM P/E with a 3% dividend yield and ~2.0x TBV",
      "confidence": 0.95,
      "source_quote": "ARTNA remains reasonably priced given the stability and growth of the business (~20x LTM P/E; 3% dividend yield; ~2.0x TBV)",
      "subject": "e1"
    },
    {
      "id": "c7",
      "claim_type": "atom",
      "text": "YORW trades at 35x TTM P/E and MSEX trades at 29x TTM P/E",
      "confidence": 0.95,
      "source_quote": "ARTNA currently trades at ~20x TTM P/E, while YORW and MSEX trade at 35x and 29x respectively",
      "subject": "e3"
opus default 9697848786 90
The candidate extraction is highly comprehensive and closely matches the reference, with additional atoms and assertions that add value. It captures more timeline details (auditor resignation, OTC trading, SEC settlements), includes extra entities like EC Equipment Business, and adds useful assertions like tax equity investor incentives and cash flow return to shareholders. Relationships are well-structured and accurate. Minor differences from reference are negligible.
Claims: 70 · Relationships: 7 · Entities: 16 · Atoms: 41 · Assertions: 22
entity_coverage
90.0
atom_completeness
93.0
assertion_quality
91.0
relationship_accuracy
88.0
precision
90.0
source_grounding
92.0
Content preview (24945 chars)

```json
{
  "entities": [
    {"id": "e1", "name": "ADES (Advanced Emissions Solutions)", "type": "company"},
    {"id": "e2", "name": "Tinuum Group", "type": "company"},
    {"id": "e3", "name": "M-Prove", "type": "business_segment"},
    {"id": "e4", "name": "Arthur J. Gallagher", "type": "company"},
    {"id": "e5", "name": "Goldman Sachs", "type": "company"},
    {"id": "e6", "name": "Heath Sampson", "type": "person"},
    {"id": "e7", "name": "Rodman & Renshaw", "type": "company"},
    {"id": "e8", "name": "Refined Coal (RC) Tax Credits", "type": "concept"},
    {"id": "e9", "name": "Section 45 Tax Credits", "type": "concept"},
    {"id": "e10", "name": "IRS", "type": "government_agency"},
    {"id": "e11", "name": "WW Grainger (GWW)", "type": "company"},
    {"id": "e12", "name": "DTE Energy", "type": "company"},
    {"id": "e13", "name": "Capital One", "type": "company"},
    {"id": "e14", "name": "Fidelity Investments", "type": "company"},
    {"id": "e15", "name": "Patent Portfolio", "type": "asset"},
    {"id": "e16", "name": "EC Equipment Business", "type": "business_segment"}
  ],
  "claims": [
    {
      "id": "c1",
      "claim_type": "atom",
      "text": "ADES trades at approximately $7.50 per share",
      "confidence": 0.95,
      "source_quote": "a run-off of ADES (@ $7.50) has signficant upside",
      "subject": "e1"
    },
    {
      "id": "c2",
      "claim_type": "atom",
      "text": "ADES has a market capitalization of $165MM with 22MM shares outstanding",
      "confidence": 0.95,
      "source_quote": "$165MM market capitalization (22MM shares at $7.50)",
      "subject": "e1"
    },
    {
      "id": "c3",
      "claim_type": "atom",
      "text": "ADES has $11MM cash including restricted cash",
      "confidence": 0.95,
      "source_quote": "$11MM cash (including restricted cash)",
      "subject": "e1"
    },
    {
      "id": "c4",
      "claim_type": "atom",
      "text": "ADES has $23MM debt related to royalty litigation liabilities",
      "confidence": 0.95,
      "source_quote": "$23MM debt (liabilities related to royalty related litigation)",
      "subject": "e1"
    },
    {
      "id": "c5",
      "claim_type": "atom",
      "text": "ADES owns a 42.5% stake in Tinuum Group",
      "confidence": 0.95,
      "source_quote": "ADES owns its RC facilities through a 42.5% stake in Tinuum",
      "subject": "e1"
    },
    {
      "id": "c6",
      "claim_type": "atom",
      "text": "Tinuum is a joint venture with a private party and Goldman Sachs",
      "confidence": 0.95,
      "source_quote": "Tinuum (a joint venture with a private party and Goldman Sachs)",
      "subject": "e2"
    },
    {
      "id": "c7",
      "claim_type": "atom",
      "text": "ADES built 28 RC facilities",
      "confidence": 0.95,
      "source_quote": "ADES built 28 of these RC facilities",
      "subject": "e1"
    },
    {
      "id": "c8",
      "claim_type": "atom",
      "text": "Tinuum has monetized 13 of its 28 RC faci
opus default 57542 89
The candidate extraction is very comprehensive and closely mirrors the reference. It captures all key entities, nearly all atoms and assertions, and builds meaningful relationships. Minor differences include slightly different organization of some claims and the omission of the atom about Stronach using super-voting Class B shares to control MIM (reference c6), though this is partially captured in c6's settlement description. The candidate also adds a useful assertion about the strategic review timeline (c40) not explicitly in the reference.
Claims: 48 · Relationships: 7 · Entities: 8 · Atoms: 30 · Assertions: 11
entity_coverage
100.0
atom_completeness
90.0
assertion_quality
90.0
relationship_accuracy
85.0
precision
90.0
source_grounding
92.0
Content preview (20144 chars)

```json
{
  "entities": [
    {"id": "e1", "name": "MIM", "type": "company"},
    {"id": "e2", "name": "Magna International (MGA)", "type": "company"},
    {"id": "e3", "name": "Frank Stronach", "type": "person"},
    {"id": "e4", "name": "William Lenehan", "type": "person"},
    {"id": "e5", "name": "Greenlight", "type": "company"},
    {"id": "e6", "name": "Farallon", "type": "company"},
    {"id": "e7", "name": "Horse racing business", "type": "business_segment"},
    {"id": "e8", "name": "Industrial REIT peers", "type": "sector"}
  ],
  "claims": [
    {
      "id": "c1",
      "claim_type": "atom",
      "text": "MIM is a Canadian-based real estate company that primarily holds an industrial portfolio of operating assets leased to Magna International",
      "confidence": 0.95,
      "source_quote": "MIM is a Canadian-based real estate company that primarily holds an industrial portfolio of operating assets leased to Magna International (NYSE: MGA) and its automotive operating units.",
      "subject": "e1"
    },
    {
      "id": "c2",
      "claim_type": "atom",
      "text": "MIM was spun out of MGA in 2003 as a holding company for MGA's automotive real estate assets",
      "confidence": 0.95,
      "source_quote": "MIM was originally spun-out of MGA in 2003 as a holding company for MGA's automotive real estate assets.",
      "subject": "e1"
    },
    {
      "id": "c3",
      "claim_type": "atom",
      "text": "Between 2003 and 2010, MIM supported a horse racing business that was bleeding nearly $100mm per year in cash flow",
      "confidence": 0.90,
      "source_quote": "Between 2003 and 2010, MIM, of which Stronach had a minimal economic ownership in, supported a horse racing business that was bleeding nearly $100mm per year in cash flow.",
      "subject": "e7"
    },
    {
      "id": "c4",
      "claim_type": "atom",
      "text": "The race track subsidiary filed for bankruptcy in 2009",
      "confidence": 0.95,
      "source_quote": "Eventually, the race track subsidiary finally ran out of money and filed for bankruptcy in 2009.",
      "subject": "e7"
    },
    {
      "id": "c5",
      "claim_type": "atom",
      "text": "MIM had a significant equity stake and almost $400mm+ of secured loans in the racing subsidiary",
      "confidence": 0.90,
      "source_quote": "In lieu of MIM's significant equity stake and almost $400mm+ of secured loans, MIM assumed the majority of the racing assets in 2010.",
      "subject": "e1"
    },
    {
      "id": "c6",
      "claim_type": "atom",
      "text": "Stronach agreed to leave the company, give up super-voting shares, and take all horse racing assets with him",
      "confidence": 0.95,
      "source_quote": "Late last year, however, the shareholders finally came to an agreement with Stronach whereby he leaves the company, gives up his super-voting shares, and takes all the horse racing assets with him.",
      "subject": "e3"
    },
    {
      "id": "c7",
      "claim_type": "
opus default 5994304072 89
The candidate extraction is very comprehensive and closely mirrors the reference. It captures nearly all key entities, atoms, assertions, and relationships with accurate source quotes. Minor differences include the addition of 'You Invest' as a separate entity (reasonable), slightly different organization of some assertions, and the e-broker industry not being called out as a separate entity (instead attributed to e1). Relationships are well-structured with appropriate influence weights.
Claims: 48 · Relationships: 6 · Entities: 13 · Atoms: 28 · Assertions: 14
entity_coverage
88.0
atom_completeness
92.0
assertion_quality
90.0
relationship_accuracy
87.0
precision
90.0
source_grounding
92.0
Content preview (19623 chars)

```json
{
  "entities": [
    {"id": "e1", "name": "Charles Schwab (SCHW)", "type": "company"},
    {"id": "e2", "name": "Robinhood", "type": "company"},
    {"id": "e3", "name": "JPMorgan (JPM)", "type": "company"},
    {"id": "e4", "name": "Interactive Brokers (IBKR)", "type": "company"},
    {"id": "e5", "name": "Fidelity", "type": "company"},
    {"id": "e6", "name": "Raymond James Financial (RJF)", "type": "company"},
    {"id": "e7", "name": "Mutual Fund OneSource", "type": "product"},
    {"id": "e8", "name": "ETF OneSource", "type": "product"},
    {"id": "e9", "name": "Advisor Services", "type": "business_segment"},
    {"id": "e10", "name": "Investor Services", "type": "business_segment"},
    {"id": "e11", "name": "Advisor Solutions", "type": "business_segment"},
    {"id": "e12", "name": "IBKR Lite", "type": "product"},
    {"id": "e13", "name": "You Invest", "type": "product"}
  ],
  "claims": [
    {
      "id": "c1",
      "claim_type": "atom",
      "text": "Schwab has approximately $3.7 trillion of client assets under custody",
      "confidence": 0.95,
      "source_quote": "Schwab is one of the largest wealth management firms globally with ~$3.7tn of client assets under custody.",
      "subject": "e1"
    },
    {
      "id": "c2",
      "claim_type": "atom",
      "text": "Client assets break down roughly 50%/50% between retail (Investor Services) and institutional (Advisor Services)",
      "confidence": 0.90,
      "source_quote": "The assets break down roughly 50%/50% between (i) the retail business (\"Investor Services\") and (ii) the institutional business (\"Advisor Services\")",
      "subject": "e1"
    },
    {
      "id": "c3",
      "claim_type": "atom",
      "text": "Revenue and pre-tax contribution split is 70%/30% retail vs institutional",
      "confidence": 0.90,
      "source_quote": "the revenue and pre-tax contribution from the retail side is substantially higher than the institutional side (70%/30%)",
      "subject": "e1"
    },
    {
      "id": "c4",
      "claim_type": "atom",
      "text": "Interest income on balance sheet is ~60% of total revenue",
      "confidence": 0.90,
      "source_quote": "Interest income on its balance sheet (funded by client cash): ~60% of total",
      "subject": "e1"
    },
    {
      "id": "c5",
      "claim_type": "atom",
      "text": "Asset management revenues are ~30% of total revenue",
      "confidence": 0.90,
      "source_quote": "Asset management revenues: ~30% of total",
      "subject": "e1"
    },
    {
      "id": "c6",
      "claim_type": "atom",
      "text": "Trading commissions/payment for order flow are ~10% of total revenue",
      "confidence": 0.90,
      "source_quote": "Trading commissions/payment for order flow: ~10% of total",
      "subject": "e1"
    },
    {
      "id": "c7",
      "claim_type": "atom",
      "text": "Schwab cut commission rates from $9 to $7 per trade in early 2017, then to $5 four weeks later after Fidelity responded",
  
opus default 8168947743 89
The candidate extraction is very comprehensive and closely mirrors the reference. It captures nearly all entities, atoms, assertions, and relationships with accurate source quotes. Minor differences include combining some inventory data points into a single atom and a few slightly different entity ID assignments for subjects, but overall quality is high.
Claims: 49 · Relationships: 8 · Entities: 16 · Atoms: 30 · Assertions: 11
entity_coverage
95.0
atom_completeness
88.0
assertion_quality
92.0
relationship_accuracy
85.0
precision
92.0
source_grounding
93.0
Content preview (19390 chars)

```json
{
  "entities": [
    {"id": "e1", "name": "NL Industries", "type": "company"},
    {"id": "e2", "name": "Harold C. Simmons", "type": "person"},
    {"id": "e3", "name": "Kronos Worldwide (KRO)", "type": "company"},
    {"id": "e4", "name": "Valhi, Inc (VHI)", "type": "company"},
    {"id": "e5", "name": "CompX International (CIX)", "type": "company"},
    {"id": "e6", "name": "Tronox (TROX)", "type": "company"},
    {"id": "e7", "name": "Huntsman (HUN)", "type": "company"},
    {"id": "e8", "name": "Rockwood", "type": "company"},
    {"id": "e9", "name": "Titanium Metals (TIE)", "type": "company"},
    {"id": "e10", "name": "Precision Castparts", "type": "company"},
    {"id": "e11", "name": "TiO2 (titanium dioxide)", "type": "commodity"},
    {"id": "e12", "name": "Sherwin-Williams (SHW)", "type": "company"},
    {"id": "e13", "name": "DuPont", "type": "company"},
    {"id": "e14", "name": "Atlantic-Richfield (BP)", "type": "company"},
    {"id": "e15", "name": "ConAgra Grocery Products", "type": "company"},
    {"id": "e16", "name": "Santa Clara lead pigment litigation", "type": "legal_case"}
  ],
  "claims": [
    {
      "id": "c1",
      "claim_type": "atom",
      "text": "NL Industries is primarily a holding company controlled by Harold C. Simmons",
      "confidence": 0.95,
      "source_quote": "NL is primarily a holding company controlled by Texas billionaire and former corporate raider Harold C. Simmons.",
      "subject": "e1"
    },
    {
      "id": "c2",
      "claim_type": "atom",
      "text": "NL holds a 30% stake (35.2 million shares) in Kronos Worldwide worth $547.7 million at market",
      "confidence": 0.95,
      "source_quote": "30% stake (35.2 million shares) in TiO2 producer Kronos Worldwide (ticker KRO) worth $547.7 million at KRO's closing price",
      "subject": "e1"
    },
    {
      "id": "c3",
      "claim_type": "atom",
      "text": "NL holds a 4% stake (14.4 million shares) in Valhi worth $273.8 million at market",
      "confidence": 0.95,
      "source_quote": "4% stake (14.4 million shares) in Valhi, Inc (ticker VHI) worth $273.8 million at VHI's closing price",
      "subject": "e1"
    },
    {
      "id": "c4",
      "claim_type": "atom",
      "text": "NL holds an 87% stake (10.8 million shares) in CompX International worth $142.0 million at market",
      "confidence": 0.95,
      "source_quote": "87% stake (10.8 million shares) in CompX International (ticker CIX) worth $142.0 million at CIX's closing price",
      "subject": "e1"
    },
    {
      "id": "c5",
      "claim_type": "atom",
      "text": "NL has $25.1 million of cash at the parent company level",
      "confidence": 0.95,
      "source_quote": "$25.1 million of cash at the NL parent company level and wholly owned subsidiaries (i.e. excluding $52.7 million of cash held at CompX)",
      "subject": "e1"
    },
    {
      "id": "c6",
      "claim_type": "atom",
      "text": "CompX holds $52.7 million of cash separately",
     
gemini-pro default 17096 0
The candidate extraction failed entirely due to an API error. No extraction was produced, so all criteria score 0.
entity_coverage
0.0
atom_completeness
0.0
assertion_quality
0.0
relationship_accuracy
0.0
precision
0.0
source_grounding
0.0
Content preview (323 chars)
ERROR: [NotFoundError] litellm.NotFoundError: GeminiException - {
  "error": {
    "code": 404,
    "message": "models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.",
    "status": "NOT_FOUND"
  }
}
gemini-pro default 5159562350 0
The candidate extraction failed entirely due to an API error. No extraction was produced, so all criteria score 0.
entity_coverage
0.0
atom_completeness
0.0
assertion_quality
0.0
relationship_accuracy
0.0
precision
0.0
source_grounding
0.0
Content preview (323 chars)
ERROR: [NotFoundError] litellm.NotFoundError: GeminiException - {
  "error": {
    "code": 404,
    "message": "models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.",
    "status": "NOT_FOUND"
  }
}
gemini-pro default 2658326550 0
The candidate extraction failed entirely due to an API error. No extraction was produced, so all criteria score 0.
entity_coverage
0.0
atom_completeness
0.0
assertion_quality
0.0
relationship_accuracy
0.0
precision
0.0
source_grounding
0.0
Content preview (323 chars)
ERROR: [NotFoundError] litellm.NotFoundError: GeminiException - {
  "error": {
    "code": 404,
    "message": "models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.",
    "status": "NOT_FOUND"
  }
}
gemini-pro default 0124119574 0
The candidate extraction failed entirely due to an API error, producing no output whatsoever. There are no entities, claims, atoms, assertions, or relationships to evaluate.
entity_coverage
0.0
atom_completeness
0.0
assertion_quality
0.0
relationship_accuracy
0.0
precision
0.0
source_grounding
0.0
Content preview (323 chars)
ERROR: [NotFoundError] litellm.NotFoundError: GeminiException - {
  "error": {
    "code": 404,
    "message": "models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.",
    "status": "NOT_FOUND"
  }
}
gemini-pro default 7122165936 0
The candidate extraction failed entirely due to an API error. No extraction was produced, so all criteria score 0.
entity_coverage
0.0
atom_completeness
0.0
assertion_quality
0.0
relationship_accuracy
0.0
precision
0.0
source_grounding
0.0
Content preview (323 chars)
ERROR: [NotFoundError] litellm.NotFoundError: GeminiException - {
  "error": {
    "code": 404,
    "message": "models/gemini-pro is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.",
    "status": "NOT_FOUND"
  }
}

Generated 20260313_172504 — 120 candidates