# Semantic Net — Design & Rationale

## What This Is

A knowledge graph where everything is a **claim** — facts, entities, assertions, and relationships between them. Claims reference other claims through typed **slots** with **influence** weights, forming a graph that feeds probabilistic inference (`pgmpy` Bayesian networks).

Three tables do all the work:

```
claims          — nodes (atoms, entities, assertions, relationships)
claim_refs      — edges (typed slots with influence weights)
claim_provenance — who said it, how, when (audit trail)
```

## Why Not RDF?

RDF (Resource Description Framework) represents everything as `(subject, predicate, object)` triples. To say "Axon's Q4 revenue was $560M and I'm 90% confident" you need **reification** — turning the triple into a node so you can attach metadata:

```turtle
# RDF reification: 4+ triples for one annotated fact
_:stmt1 rdf:type rdf:Statement .
_:stmt1 rdf:subject :Axon .
_:stmt1 rdf:predicate :hasRevenue .
_:stmt1 rdf:object "560M" .
_:stmt1 :confidence 0.90 .
_:stmt1 :extractedFrom :vic_thesis_42 .
_:stmt1 :extractedBy "claude-sonnet" .
```

RDF-star (2024) improves this with quoted triples, but still requires a triplestore (Jena, Blazegraph, Ontotext), SPARQL queries, and ontology management.

### Why not just use a triplestore?

Two questions worth taking seriously:

**For querying** — SPARQL is powerful for graph traversal ("find all entities 3 hops from X where property Y > Z"). Our query patterns *do* include multi-hop traversal — building a `pgmpy` network means extracting the full transitive subgraph ("all claims reachable from this entity through any chain of input/output refs"). That's not a simple JOIN; it's a recursive graph walk.

SQLite handles this via recursive CTEs (`WITH RECURSIVE`), which work but aren't as ergonomic as SPARQL's path expressions or Cypher's variable-length patterns. At ~1000s of claims, recursive CTEs perform fine. At 100K+ claims with deep chains, a graph database (or at minimum, an in-memory NetworkX index) would likely outperform SQLite's recursive queries.

The honest trade-off: we're accepting slightly clunkier graph traversal queries now to avoid the infrastructure cost of a triplestore. This is the right call at current scale but should be revisited as the graph grows. The RDF-star export path (below) means migration is always available.

**For `pgmpy`** — `pgmpy` doesn't consume RDF. It takes Python lists of edge tuples and `TabularCPD` objects. Whether claims live in SQLite or a triplestore, you still write a conversion layer. The SQLite→`pgmpy` path is a single SQL query (see "From Claims to `pgmpy`" below). The triplestore→`pgmpy` path would be: SPARQL query → parse results → same Python objects. Strictly more steps, no benefit.

### RDF-star export (escape hatch)

Our schema maps cleanly to RDF-star if we ever need interoperability. Each claim becomes a quoted triple, refs become annotation triples:

```turtle
# Our claim + ref → RDF-star
<< :Axon :hasRevenue "560M" >> :confidence 0.90 ;
                                :extractedFrom :vic_thesis_42 ;
                                :extractedBy "claude-sonnet" ;
                                :claimType "atom" ;
                                :state "active" .

<< :FE_undervalued :hasEvidence :FE_trades_11x >> :influence 0.9 ;
                                                   :slot "evidence" .
```

A `claim_to_rdf()` export function is straightforward — iterate claims and refs, emit Turtle. This is a one-afternoon task if the need arises (sharing with a research group, importing into a collaborative KB, etc.). We keep it as an export option, not the storage format.

### Full example: FE thesis in RDF-star vs our SQL

The same First Energy thesis (see Complex Example below) in RDF-star Turtle:

```turtle
@prefix : <https://semnet.rivus.dev/> .
@prefix sn: <https://semnet.rivus.dev/schema/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# Entity
:claim_1 a sn:Entity ;
    sn:claimText "First Energy (FE)" ;
    sn:confidence "0.99"^^xsd:float ;
    sn:state "active" .

# Atom — raw fact
:claim_2 a sn:Atom ;
    sn:claimText "FE trades at 11x P/E, utility peers at 19x" ;
    sn:confidence "0.95"^^xsd:float ;
    sn:docId "fe_thesis" ;
    sn:state "active" .

# Assertion — references entity as subject, atom as evidence
:claim_3 a sn:Assertion ;
    sn:claimText "FE is undervalued relative to peers" ;
    sn:confidence "0.85"^^xsd:float ;
    sn:category "valuation" ;
    sn:subType "relative_discount" ;
    sn:state "active" .

# Refs as annotated quoted triples
<< :claim_3 sn:subject :claim_1 >> sn:influence "0.0"^^xsd:float .
<< :claim_3 sn:evidence :claim_2 >> sn:influence "0.9"^^xsd:float .

# Assertion — scandal contained
:claim_5 a sn:Assertion ;
    sn:claimText "FBI scandal is contained, doesn't impair core earnings" ;
    sn:confidence "0.70"^^xsd:float ;
    sn:state "active" .

# Assertion — regulatory risk (bearish)
:claim_6 a sn:Assertion ;
    sn:claimText "Regulatory risk could impair Ohio rate base permanently" ;
    sn:confidence "0.30"^^xsd:float ;
    sn:direction "bearish" ;
    sn:state "active" .

# Assertion — the conclusion (stock re-rates)
:claim_7 a sn:Assertion ;
    sn:claimText "FE stock will re-rate toward peer multiples" ;
    sn:confidence "0.65"^^xsd:float ;
    sn:category "valuation" ;
    sn:subType "perception_error" ;
    sn:state "active" .

# Influence edges — the causal structure
<< :claim_7 sn:input :claim_3 >> sn:influence "0.8"^^xsd:float ;
                                   sn:ordinal 0 .
<< :claim_7 sn:input :claim_5 >> sn:influence "0.7"^^xsd:float ;
                                   sn:ordinal 1 .
<< :claim_7 sn:input :claim_6 >> sn:influence "-0.6"^^xsd:float ;
                                   sn:ordinal 2 .

# Provenance
<< :claim_3 sn:extractedFrom "fe_thesis" >> sn:method "llm_extraction" ;
                                              sn:extractedBy "claude-sonnet" ;
                                              sn:extractedAt "2026-02-25T19:00:00Z"^^xsd:dateTime .
```

Compare to our SQL (same information):

```sql
-- 7 rows in claims
INSERT INTO claims (claim_type, claim_text, confidence, state) VALUES
  ('entity',    'First Energy (FE)',                               0.99, 'active'),
  ('atom',      'FE trades at 11x P/E, utility peers at 19x',     0.95, 'active'),
  ('assertion', 'FE is undervalued relative to peers',             0.85, 'active'),
  ('atom',      'Penalties bounded at $500M vs $15B mkt cap',      0.75, 'active'),
  ('assertion', 'FBI scandal contained, no core earnings impact',  0.70, 'active'),
  ('assertion', 'Regulatory risk could impair Ohio rate base',     0.30, 'active'),
  ('assertion', 'FE stock will re-rate toward peer multiples',     0.65, 'active');

-- 6 rows in claim_refs
INSERT INTO claim_refs (claim_id, slot, ref_id, influence) VALUES
  (3, 'subject',  1,  0.0),   -- assertion about FE
  (3, 'evidence', 2,  0.9),   -- backed by valuation atom
  (5, 'evidence', 4,  0.8),   -- scandal claim backed by penalty atom
  (7, 'input',    3, +0.8),   -- undervalued → re-rates
  (7, 'input',    5, +0.7),   -- scandal contained → re-rates
  (7, 'input',    6, -0.6);   -- regulatory risk → re-rates (negative)

-- 1 row in provenance
INSERT INTO claim_provenance (claim_id, source_type, source_ref, method, confidence)
VALUES (3, 'extraction', 'fe_thesis', 'llm_extraction', 0.85);
```

The RDF-star is 50 lines of Turtle. The SQL is 14 rows across 3 tables. Both represent identical structure. The SQL is what we query; the Turtle is available as an export format.

### Bottom line

### Dual-store architecture: SQLite + Oxigraph

Rather than choosing between SQL and SPARQL, we use both. **SQLite** is the source of truth (relational, familiar, great for metadata/provenance). **Oxigraph** (`pyoxigraph`) is the graph index for SPARQL queries — multi-hop traversal, property paths, reachability.

Oxigraph runs **fully in-process** via `pip install pyoxigraph` (Rust compiled to Python wheels). No JVM, no server, no docker. It supports RDF 1.2 triple terms (successor to RDF-star) and full SPARQL 1.1 with property paths. It's 38x faster than rdflib for SPARQL queries.

```
SQLite (source of truth)          pyoxigraph Store (graph index)
  claims, claim_refs, provenance     RDF triples + triple terms
  familiar SQL for CRUD              SPARQL for graph traversal
         |                                    |
         +-------- sync on write -------------+
         |                                    |
    relational queries               property path queries
    "all claims from doc X"          "all claims reachable from entity Y"
    "provenance for claim Z"         "causal chain from A to B"
```

On every claim/ref write to SQLite, we materialize the corresponding triples into the Oxigraph store. SPARQL handles the queries that are painful in SQL:

```sparql
# Multi-hop: all claims reachable from an entity through causal chains
PREFIX sn: <https://semnet.rivus.dev/schema/>
SELECT ?claim ?text ?conf WHERE {
    <.../claim/1> ^sn:subject/^sn:input* ?claim .
    ?claim sn:text ?text ; sn:confidence ?conf .
}
```

This one-liner replaces a recursive CTE. Property paths (`^sn:subject/^sn:input*`) express "start at entity, walk backwards through subject refs, then follow input edges zero or more times."

**Why Oxigraph over alternatives:**
- **rdflib** — no RDF-star support (disqualified)
- **Jena Fuseki / GraphDB** — JVM servers, massive overhead for ~1000s of claims
- **Blazegraph** — dead project (Wikidata is migrating away)
- **NetworkX** — no SPARQL, no RDF, no triple terms

Our equivalent in SQL:

```sql
-- One row in claims
INSERT INTO claims (claim_type, claim_text, confidence, model)
VALUES ('atom', 'Axon Q4 2025 revenue = $560M', 0.90, 'claude-sonnet');

-- One row in claim_refs to link it to the entity
INSERT INTO claim_refs (claim_id, slot, ref_id, influence)
VALUES (2, 'subject', 1, 0.0);  -- claim 2 is about entity 1 (Axon)
```

Same information, native SQL, no infrastructure.

## How We Differ from Knowledge Vault

Google's Knowledge Vault (Dong et al., KDD 2014) is the closest prior art to what we're building. Both systems:
- Extract claims from text using ML/LLM
- Assign calibrated probabilities to every fact
- Fuse evidence from multiple sources
- Use graph structure to validate extracted facts

Key differences:

| | Knowledge Vault | Semantic Net |
|---|---|---|
| **Scale** | 1.6B candidate triples, 271M high-confidence | ~1000s claims (investment theses) |
| **Extraction** | 4 parallel extractors (text, DOM, tables, annotations) | 1 LLM extractor with domain-specific prompts |
| **Fusion** | Boosted decision stumps on 8-dim feature vectors (sqrt(sources) × mean_score per extractor) | Provenance table + manual/LLM weight estimation |
| **Priors** | Path Ranking Algorithm + neural embeddings over existing graph | `pgmpy` Bayesian inference over claim graph |
| **Calibration** | Platt scaling on held-out set (predicted 0.9 ≈ actual 90%) | Backtesting against outcomes (VIC tracks winners) |
| **Storage** | Bigtable + custom infrastructure | SQLite |
| **Claim types** | Entity triples only (subject, predicate, object) | Atoms, entities, assertions, relationships (richer) |
| **Relationships** | Flat predicates (spouse, birthplace, etc.) | Typed influence edges (supports, undermines, moderates) |

**What we take from KV**: The "extract broadly, fuse carefully" philosophy. Multiple noisy extractions → probabilistic fusion → calibrated output. Their `sqrt(sources)` feature is clever — it dampens the effect of commonly-expressed facts so popularity doesn't overwhelm evidence.

**What we add**: KV stored **entity-attribute-value triples** where subjects and objects were always Freebase entities, and predicates came from a fixed schema of ~4,469 relations (spouse, birthplace, CEO, etc.). Three things it couldn't do: (1) make a claim the subject of another claim (no reification), (2) express causal/influence relationships (no `causes` or `makes_more_likely` predicates in its schema), (3) attach influence direction/weight to edges. We store *claims about claims* — "revenue growth makes margin expansion more likely" is a relationship-type claim that references two other claims through influence-weighted slots. This entire layer of reasoning structure is outside KV's design.

## Claim Types

```
atom         "AXON Q4 2025 revenue = $560M"
             Raw fact. No interpretation. Has provenance (source doc, extraction model).

entity       "Axon Enterprise (AXON)"
             A thing that exists. Other claims reference it via slot='subject'.

assertion    "AXON revenue is growing"
             Interpretive claim. References atoms as evidence, entities as subjects.
             Has confidence, direction (bullish/bearish), category from taxonomy.

relationship "Revenue growth + high fixed costs → operating leverage"
             Claim about how other claims influence each other.
             References input claims with influence weights.
             This IS the edge structure for `pgmpy`.
```

## Slots and Influence

`claim_refs` connects claims through named slots:

| Slot | Meaning | Influence? |
|------|---------|-----------|
| `subject` | Entity this claim is about | No (0.0) — just structural |
| `evidence` | Supporting fact/atom | Mild positive — having evidence makes claim more credible |
| `context` | Temporal, geographic, situational frame | No — just metadata |
| `input` | Causal input that influences this claim's truth | **Yes** — the core inference edge |
| `output` | What this claim influences (reverse pointer) | **Yes** — mirror of input |
| `moderator` | Amplifies or dampens the effect of inputs | **Yes** — conditional influence |

**Influence values** range from -1.0 to +1.0:
- `+0.9` — strongly makes more likely ("revenue growing" → "margin expansion")
- `+0.3` — weakly makes more likely ("good management" → "margin expansion")
- `-0.7` — makes less likely ("competitor entered market" → "margin expansion")
- `0.0` — structural reference only (subject, context)

## Complex Example: A VIC Thesis

Thesis: "First Energy (FE) trades at 11x P/E vs 19x peer average. The FBI bribery scandal is contained. Regulatory penalties are bounded at $500M. Core regulated earnings are ~$2.50/share."

```
┌─────────────────────────────────┐
│ entity: "First Energy (FE)"     │ claim_id=1
└─────────────┬───────────────────┘
              │ subject
              ▼
┌─────────────────────────────────┐
│ atom: "FE trades at 11x P/E"    │ claim_id=2, confidence=0.95
│       "Utility peers at 19x"    │
└─────────────┬───────────────────┘
              │ evidence (+0.9)
              ▼
┌─────────────────────────────────┐
│ assertion: "FE is undervalued   │ claim_id=3, confidence=0.85
│  relative to peers"             │ category=valuation.relative_discount
└─────────────┬───────────────────┘
              │ input (+0.8)
              ▼
┌─────────────────────────────────┐
│ assertion: "FE stock will       │ claim_id=7, confidence=0.65
│  re-rate toward peer multiples" │ category=valuation.perception_error
└─────────────▲───────────────────┘
              │ input (+0.7)           input (-0.6)
              │                            │
┌─────────────┴──────────────┐  ┌──────────┴──────────────────┐
│ assertion: "FBI scandal    │  │ assertion: "Regulatory risk │
│  is contained, doesn't     │  │  could impair Ohio rate base│
│  impair core earnings"     │  │  permanently"               │
│ claim_id=5, conf=0.70      │  │ claim_id=6, conf=0.30       │
└─────────────▲──────────────┘  └─────────────────────────────┘
              │ evidence (+0.8)
┌─────────────┴──────────────┐
│ atom: "Penalties bounded   │ claim_id=4, confidence=0.75
│  at $500M vs $15B mkt cap" │
└────────────────────────────┘
```

In SQL this is 7 claims + ~10 claim_refs. In `pgmpy` this becomes a 5-node Bayesian network (entities and atoms are evidence, assertions are inference nodes).

## Conjunctive Influences (Lists)

"Revenue growth AND high fixed costs together cause operating leverage" isn't two separate edges — it's one conjunctive relationship:

```sql
-- The relationship claim itself
INSERT INTO claims (claim_type, claim_text, confidence)
VALUES ('relationship', 'Revenue growth + fixed costs → operating leverage', 0.90);
-- claim_id = 10

-- Its inputs (the "AND" part)
INSERT INTO claim_refs (claim_id, slot, ref_id, influence, ordinal)
VALUES (10, 'input', 8, +0.8, 0),   -- revenue growth
       (10, 'input', 9, +0.7, 1);   -- fixed costs

-- Its output
INSERT INTO claim_refs (claim_id, slot, ref_id, influence)
VALUES (10, 'output', 11, +0.85);   -- operating leverage claim
```

Multiple inputs on the same relationship claim = conjunctive cause. `pgmpy` sees claims 8 and 9 as joint parents of claim 11, with the relationship claim encoding their combined influence.

## Moderators

"Good management amplifies the operating leverage effect":

```sql
-- Management quality claim
INSERT INTO claims (claim_type, claim_text, confidence)
VALUES ('assertion', 'FE has competent utility management', 0.60);
-- claim_id = 12

-- It moderates the operating leverage relationship
INSERT INTO claim_refs (claim_id, slot, ref_id, influence)
VALUES (10, 'moderator', 12, +0.4);  -- amplifies the relationship
```

In `pgmpy`: claim 12 becomes an additional parent node of claim 11, but its CPT contribution is modulated — it doesn't directly cause operating leverage, it makes the revenue+costs→leverage link stronger when true.

## Where Weights Come From

### Initial extraction (LLM-assigned)
The LLM extracts claims and estimates initial confidence and influence. These are rough but useful starting points. The extraction prompt asks explicitly: "On a scale of 0-1, how confident are you? Does this make the target more likely (+) or less likely (-)?"

### Calibration against outcomes
VIC tracks which theses were "winners." Over time, we compare predicted confidence to actual outcomes:
- Claims at 0.8 confidence that were right 60% of the time → systematic overconfidence → adjust down
- This is Platt scaling (same technique Knowledge Vault used)

### Multi-source fusion
When the same claim is extracted from multiple sources (two VIC theses about the same company, an earnings call, a news article), the provenance table records each extraction. Fusion logic (inspired by KV):
- More independent sources → higher confidence
- Disagreeing sources → flag for review, don't blindly average
- `sqrt(n_sources)` dampening prevents popularity from overwhelming evidence

### Human correction
Provenance table tracks `source_type='human'` entries. A human can override any weight. The override is recorded alongside the original, preserving history.

### Inference-derived
When `pgmpy` computes a posterior that differs from the LLM-assigned confidence, the delta is recorded as `source_type='inference'` provenance. Over time, this reveals where LLMs are systematically miscalibrated.

## Claim States

| State | Meaning |
|-------|---------|
| `active` | Current, believed to hold |
| `deprecated` | Was wrong or outdated (soft delete, preserves history) |
| `superseded` | Replaced by newer claim (ref the replacement via claim_refs) |
| `validated` | Confirmed against real-world outcome |

## From Claims to `pgmpy`

The conversion is a SQL query:

```sql
-- Extract the DAG edges for pgmpy (inside code block)
SELECT
    parent.claim_id AS parent_id,
    parent.claim_text AS parent_text,
    child.claim_id AS child_id,
    child.claim_text AS child_text,
    cr.influence
FROM claim_refs cr
JOIN claims parent ON parent.claim_id = cr.ref_id
JOIN claims child ON child.claim_id = cr.claim_id
WHERE cr.slot = 'input'
  AND parent.state = 'active'
  AND child.state = 'active'
ORDER BY child.claim_id, cr.ordinal;
```

Each row becomes an edge in the `pgmpy` network. The `influence` values seed the CPT estimation. Multiple parents per child = conjunctive causes = multi-parent CPT.

## Lineage & Related Work

The space of "extract knowledge from text, represent it as a graph, reason over it" has a 12-year arc. Here's where we sit in it.

### Knowledge Vault lineage (2014→)

**Knowledge Vault** (Dong et al., KDD 2014) — Google's probabilistic KG. 1.6B candidate triples fused from 4 extractors, calibrated via Platt scaling. Never released publicly. Key papers:
- *Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion* (KDD 2014)
- *Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources* (VLDB 2015) — follow-on that used KV-style extraction to score source credibility
- Nickel, Murphy, Tresp, Gabrilovich — *A Review of Relational Machine Learning for Knowledge Graphs* (IEEE 2016) — comprehensive survey including KV's approach

**Xin Luna Dong** (KV co-author, then Amazon, now Meta):
- **AutoKnow** (KDD 2020) — automatic KG construction for e-commerce at Amazon. Same extract→fuse→clean philosophy as KV, applied to product graphs. Handles noisy multi-source extraction, ontology bootstrapping
- Continues publishing on knowledge graph quality and completeness at Meta

**DeepDive → Snorkel → Snorkel AI**:
- **DeepDive** (Stanford, 2012-2015) — statistical extraction system that inspired KV's multi-extractor fusion. Factor graphs over candidate extractions
- **Snorkel** (Stanford, 2016-2019) — replaced hand-written extractors with labeling functions and weak supervision. Same insight: noisy labels from multiple sources can be fused to high quality
- **Snorkel AI** (commercial) — enterprise data labeling. The lineage: DeepDive's statistical extraction → Snorkel's weak supervision → programmatic labeling at scale

### KG embedding explosion (2013→)

The KV era spawned a decade of work on learning vector representations of KG structure:

- **TransE** (Bordes et al., 2013) — `head + relation ≈ tail` in embedding space. Simple, hugely influential
- **TransR, RotatE, QuatE** — geometric variants (hyperplanes, rotations, quaternions)
- **UKGE** (Chen et al., 2019) — *Embedding Uncertain Knowledge Graphs*. First to handle probabilistic/uncertain triples in embeddings. Directly relevant to our confidence-weighted claims
- **BEUrRE** (Chen et al., 2021) — box embeddings for uncertain relational data

### Current LLM + KG tools (2023→)

The LLM wave created a new generation of "extract graph from text" tools:

| Tool | Approach | Uncertainty? | Our take |
|------|----------|-------------|----------|
| **Microsoft GraphRAG** (2024) | LLM extracts entities+relations, builds community summaries, uses graph for RAG | No — triples are binary | Good extraction patterns, but the graph is a retrieval index, not an inference substrate |
| **LightRAG** (2024) | Lightweight GraphRAG — dual-level retrieval (entity + relation) | No | Faster, simpler, same limitation |
| **Graphiti / Zep** (2024) | Temporal KG from conversational memory. Episodes → entities + edges with timestamps | No | Smart temporal handling, but no probabilistic reasoning |
| **Neo4j LLM Graph Builder** | LLM → Cypher → property graph in Neo4j | No | Good Cypher generation, but infrastructure-heavy for our scale |
| **LlamaIndex KG Index** | LLM extracts triplets, stores in graph, queries via traversal + LLM | No | Simple integration, no confidence or fusion |

**The gap we fill**: None of these handle uncertainty. They extract triples — they don't score confidence, track provenance across sources, or fuse contradictory evidence. They build graphs for *retrieval* (find relevant context for LLM generation). We build graphs for *inference* (compute posterior probabilities given evidence). That's the fundamental difference.

### Argument mining & structured reasoning

- **Toulmin model** (1958) — our slot system maps to it: evidence=grounds, input+=warrant, input-=rebuttal, confidence=qualifier, moderator=backing
- **IBM Project Debater** (2018) — argument mining from text, stance detection. Commercial, not released
- **Kialo** — collaborative argument mapping platform. Visual, no probabilistic reasoning
- **Argdown** — markdown-like syntax for argument maps. Good visualization, no computation

### Probabilistic programming + KGs

- **Probabilistic Soft Logic (PSL)** (Bach et al., 2017) — Hinge-loss Markov random fields over first-order logic rules. Efficient collective inference over uncertain KGs
- **TuPaQ** (Sparks et al., 2015) — query optimization for probabilistic inference
- **pgmpy** (Ankan & Panda, 2015→) — pure Python probabilistic graphical models. What we use for Bayesian network inference. Active, well-documented, handles exact and approximate inference

### Where we are

```
2012  DeepDive (statistical extraction)
2013  TransE (KG embeddings)
2014  Knowledge Vault (probabilistic KG at scale)
2015  Snorkel (weak supervision)
      pgmpy (probabilistic graphical models)
2016  Nickel et al. review paper
2019  UKGE (uncertain KG embeddings)
2020  AutoKnow (Amazon product KG)
2024  GraphRAG, LightRAG, Graphiti (LLM + KG for retrieval)
2025  RDF 1.2 (triple terms — claims about claims)
──────────────────────────────────────────────────
2026  Semantic Net ← we are here
      Claims + influence slots + pgmpy inference
      LLM extraction + probabilistic fusion + calibration
      SQLite + Oxigraph dual store
```

We're combining three threads that haven't been woven together before: (1) KV-style probabilistic extraction and fusion, (2) `pgmpy` Bayesian inference over the resulting graph, (3) modern LLMs as the extraction engine (replacing KV's 4 specialized extractors with one prompted LLM). The gap — uncertainty-aware KGs with causal inference — remains open in the current LLM+KG ecosystem.