> **Note (2026-03-24):** intel/learning/ideas/semnet UIs consolidated into `kb.localhost` (port 7840). Old standalone URLs (intel.localhost, learning.localhost, ideas.localhost, semnet.localhost) are retired.

# Idea Evaluation System — Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Build an `ideas/` module that extracts ideas from writing, builds a landscape corpus via interactive search, and scores ideas on 8 dimensions with floor-per-group aggregation.

**Architecture:** 4-stage pipeline (Extract → Landscape → Corpus Extract → Score). SQLite for structured data, sqlite-vec for embeddings. Batched LLM evaluation (2-3 dims per call) with multi-model consensus for hard dimensions. Gradio UI on port 7990 (`kb.localhost/ideas`).

**Tech Stack:** Python, Gradio 6, SQLite + sqlite-vec (`lib/vectors`), `lib/llm` (call_llm), `lib/ingest` (fetch), `lib/discovery_ops` (serper_search), `lib/semnet` (embedding pattern).

**Design doc:** `docs/plans/2026-03-02-idea-evaluation-design.md`

---

## Task 1: Data Models

**Files:**
- Create: `ideas/__init__.py`
- Create: `ideas/models.py`
- Create: `ideas/tests/__init__.py`
- Create: `ideas/tests/test_models.py`

**Step 1: Write the failing test**

```python
# ideas/tests/test_models.py
"""Tests for idea evaluation data models."""
from ideas.models import Idea, IdeaScore, LandscapeSource, EvaluationProfile, DIMENSIONS, GROUPS


def test_idea_creation():
    idea = Idea(text="X causes Y because Z", idea_type="claim", source="user")
    assert idea.idea_type == "claim"
    assert idea.source == "user"
    assert idea.id  # auto-generated UUID
    assert idea.extracted_at > 0


def test_idea_score_creation():
    score = IdeaScore(idea_id="abc", dimension="claim_precision", score=7.5, rationale="Specific mechanism", model="opus")
    assert score.dimension == "claim_precision"
    assert score.score == 7.5
    assert score.scored_at > 0


def test_landscape_source_defaults():
    src = LandscapeSource(url="https://example.com", title="Example", snippet="...", search_term="test")
    assert src.status == "pending"
    assert src.user_notes is None


def test_dimensions_complete():
    assert len(DIMENSIONS) == 8
    assert "claim_precision" in DIMENSIONS
    assert "semantic_novelty" in DIMENSIONS
    assert "composability" in DIMENSIONS


def test_groups_structure():
    assert set(GROUPS.keys()) == {"substance", "novelty", "expression", "fertility"}
    # Each group maps to a list of dimension names
    all_dims = [d for dims in GROUPS.values() for d in dims]
    assert set(all_dims) == set(DIMENSIONS.keys())


def test_evaluation_profile_floor_scores():
    """Floor scores = weakest link per group, not average."""
    scores = {
        "claim_precision": 7.0, "internal_coherence": 6.0, "evidential_grounding": 5.0,
        "semantic_novelty": 8.0, "framing_novelty": 7.0,
        "clarity": 6.0, "rhetorical_force": 8.0,
        "generativity": 9.0, "composability": 6.0,
    }
    profile = EvaluationProfile.from_scores(scores)
    assert profile.floors == {"substance": 5.0, "novelty": 7.0, "expression": 6.0, "fertility": 6.0}
```

**Step 2: Run test to verify it fails**

Run: `cd /Users/tchklovski/all-code/rivus && python -m pytest ideas/tests/test_models.py -v`
Expected: FAIL — ModuleNotFoundError

**Step 3: Write minimal implementation**

```python
# ideas/__init__.py
"""Idea evaluation system — extract, landscape, score."""

# ideas/models.py
"""Data models for idea evaluation."""
import time
import uuid
from dataclasses import dataclass, field


DIMENSIONS: dict[str, str] = {
    "claim_precision": "How specific and falsifiable is the core claim?",
    "internal_coherence": "Does the argument's structure hold together?",
    "evidential_grounding": "Is the idea anchored in evidence?",
    "semantic_novelty": "How distant from the existing landscape corpus?",
    "framing_novelty": "Does the idea reframe the problem in new ways?",
    "clarity": "Can a reader extract the core claim without ambiguity?",
    "rhetorical_force": "How effectively does the expression serve the idea?",
    "generativity": "Does the idea suggest new questions or directions?",
    "composability": "Can this idea be productively combined with others?",
}

GROUPS: dict[str, list[str]] = {
    "substance": ["claim_precision", "internal_coherence", "evidential_grounding"],
    "novelty": ["semantic_novelty", "framing_novelty"],
    "expression": ["clarity", "rhetorical_force"],
    "fertility": ["generativity", "composability"],
}


@dataclass
class Idea:
    text: str
    idea_type: str  # "claim" | "thesis"
    source: str     # "user" | URL
    id: str = field(default_factory=lambda: uuid.uuid4().hex[:12])
    source_span: tuple[int, int] | None = None
    extraction_model: str = ""
    extracted_at: float = field(default_factory=time.time)
    confidence: float = 0.0


@dataclass
class IdeaScore:
    idea_id: str
    dimension: str
    score: float        # 1-10
    rationale: str
    model: str
    scored_at: float = field(default_factory=time.time)


@dataclass
class LandscapeSource:
    url: str
    title: str
    snippet: str
    search_term: str
    status: str = "pending"          # "pending" | "accepted" | "rejected"
    user_notes: str | None = None
    fetched_content: str | None = None


@dataclass
class EvaluationProfile:
    """Per-group floor scores — weakest link, not average."""
    scores: dict[str, float]         # dimension → score
    floors: dict[str, float]         # group → floor score

    @classmethod
    def from_scores(cls, scores: dict[str, float]) -> "EvaluationProfile":
        floors = {}
        for group, dims in GROUPS.items():
            group_scores = [scores[d] for d in dims if d in scores]
            floors[group] = min(group_scores) if group_scores else 0.0
        return cls(scores=scores, floors=floors)
```

**Step 4: Run test to verify it passes**

Run: `cd /Users/tchklovski/all-code/rivus && python -m pytest ideas/tests/test_models.py -v`
Expected: all 6 tests PASS

**Step 5: Commit**

```bash
git add ideas/__init__.py ideas/models.py ideas/tests/__init__.py ideas/tests/test_models.py
git commit -m "feat(ideas): add data models — Idea, IdeaScore, LandscapeSource, EvaluationProfile"
```

---

## Task 2: SQLite Store — Schema + CRUD

**Files:**
- Create: `ideas/store.py`
- Create: `ideas/tests/test_store.py`

**Step 1: Write the failing test**

```python
# ideas/tests/test_store.py
"""Tests for idea evaluation store."""
import tempfile
from pathlib import Path

import pytest

from ideas.models import Idea, IdeaScore, LandscapeSource
from ideas.store import IdeaStore


@pytest.fixture
def store(tmp_path):
    return IdeaStore(db_path=tmp_path / "test.db", vector_path=tmp_path / "vectors")


def test_store_and_get_idea(store):
    idea = Idea(text="X causes Y", idea_type="claim", source="user")
    store.save_idea(idea, project="test_proj")
    got = store.get_ideas(project="test_proj")
    assert len(got) == 1
    assert got[0]["text"] == "X causes Y"
    assert got[0]["idea_type"] == "claim"


def test_store_and_get_score(store):
    idea = Idea(text="X causes Y", idea_type="claim", source="user")
    store.save_idea(idea, project="test_proj")
    score = IdeaScore(idea_id=idea.id, dimension="claim_precision", score=7.0, rationale="Good", model="opus")
    store.save_score(score, project="test_proj")
    scores = store.get_scores(idea_id=idea.id, project="test_proj")
    assert len(scores) == 1
    assert scores[0]["score"] == 7.0


def test_store_landscape_source(store):
    src = LandscapeSource(url="https://example.com", title="Ex", snippet="...", search_term="test")
    store.save_source(src, project="test_proj")
    sources = store.get_sources(project="test_proj")
    assert len(sources) == 1
    assert sources[0]["status"] == "pending"


def test_update_source_status(store):
    src = LandscapeSource(url="https://example.com", title="Ex", snippet="...", search_term="test")
    store.save_source(src, project="test_proj")
    store.update_source_status("https://example.com", "accepted", project="test_proj")
    sources = store.get_sources(project="test_proj", status="accepted")
    assert len(sources) == 1


def test_projects_isolation(store):
    idea_a = Idea(text="A", idea_type="claim", source="user")
    idea_b = Idea(text="B", idea_type="claim", source="user")
    store.save_idea(idea_a, project="proj_a")
    store.save_idea(idea_b, project="proj_b")
    assert len(store.get_ideas(project="proj_a")) == 1
    assert len(store.get_ideas(project="proj_b")) == 1
```

**Step 2: Run test to verify it fails**

Run: `python -m pytest ideas/tests/test_store.py -v`
Expected: FAIL — ImportError

**Step 3: Write minimal implementation**

```python
# ideas/store.py
"""SQLite + sqlite-vec storage for idea evaluation."""
import sqlite3
from pathlib import Path

from lib.vectors import VectorStore

from ideas.models import Idea, IdeaScore, LandscapeSource

_SCHEMA = """
CREATE TABLE IF NOT EXISTS ideas (
    id TEXT PRIMARY KEY,
    project TEXT NOT NULL,
    text TEXT NOT NULL,
    idea_type TEXT NOT NULL,
    source TEXT NOT NULL,
    source_span_start INTEGER,
    source_span_end INTEGER,
    extraction_model TEXT DEFAULT '',
    extracted_at REAL NOT NULL,
    confidence REAL DEFAULT 0.0
);
CREATE INDEX IF NOT EXISTS idx_ideas_project ON ideas(project);

CREATE TABLE IF NOT EXISTS scores (
    idea_id TEXT NOT NULL,
    project TEXT NOT NULL,
    dimension TEXT NOT NULL,
    score REAL NOT NULL,
    rationale TEXT NOT NULL,
    model TEXT NOT NULL,
    scored_at REAL NOT NULL,
    PRIMARY KEY (idea_id, dimension, model)
);
CREATE INDEX IF NOT EXISTS idx_scores_idea ON scores(idea_id);

CREATE TABLE IF NOT EXISTS sources (
    url TEXT NOT NULL,
    project TEXT NOT NULL,
    title TEXT NOT NULL,
    snippet TEXT NOT NULL,
    search_term TEXT NOT NULL,
    status TEXT DEFAULT 'pending',
    user_notes TEXT,
    fetched_content TEXT,
    PRIMARY KEY (url, project)
);
CREATE INDEX IF NOT EXISTS idx_sources_project ON sources(project);
"""

COL_IDEAS = "idea_embeddings"


class IdeaStore:
    def __init__(self, *, db_path: Path, vector_path: Path):
        self._db_path = Path(db_path)
        self._db_path.parent.mkdir(parents=True, exist_ok=True)
        self._conn = sqlite3.connect(str(self._db_path))
        self._conn.row_factory = sqlite3.Row
        self._conn.executescript(_SCHEMA)
        self._vs = VectorStore(vector_path)
        self._vs.ensure_collection(COL_IDEAS, dim=1536, model="text-embedding-3-small")

    def save_idea(self, idea: Idea, project: str) -> None:
        span_start = idea.source_span[0] if idea.source_span else None
        span_end = idea.source_span[1] if idea.source_span else None
        self._conn.execute(
            "INSERT OR REPLACE INTO ideas VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
            (idea.id, project, idea.text, idea.idea_type, idea.source,
             span_start, span_end, idea.extraction_model, idea.extracted_at, idea.confidence),
        )
        self._conn.commit()

    def get_ideas(self, project: str, *, source: str | None = None) -> list[dict]:
        sql = "SELECT * FROM ideas WHERE project = ?"
        params: list = [project]
        if source:
            sql += " AND source = ?"
            params.append(source)
        return [dict(r) for r in self._conn.execute(sql, params).fetchall()]

    def save_score(self, score: IdeaScore, project: str) -> None:
        self._conn.execute(
            "INSERT OR REPLACE INTO scores VALUES (?, ?, ?, ?, ?, ?, ?)",
            (score.idea_id, project, score.dimension, score.score,
             score.rationale, score.model, score.scored_at),
        )
        self._conn.commit()

    def save_scores(self, scores: list[IdeaScore], project: str) -> None:
        for s in scores:
            self.save_score(s, project)

    def get_scores(self, idea_id: str, project: str) -> list[dict]:
        rows = self._conn.execute(
            "SELECT * FROM scores WHERE idea_id = ? AND project = ?", (idea_id, project)
        ).fetchall()
        return [dict(r) for r in rows]

    def save_source(self, source: LandscapeSource, project: str) -> None:
        self._conn.execute(
            "INSERT OR REPLACE INTO sources VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
            (source.url, project, source.title, source.snippet,
             source.search_term, source.status, source.user_notes, source.fetched_content),
        )
        self._conn.commit()

    def update_source_status(self, url: str, status: str, project: str) -> None:
        self._conn.execute(
            "UPDATE sources SET status = ? WHERE url = ? AND project = ?",
            (status, url, project),
        )
        self._conn.commit()

    def get_sources(self, project: str, *, status: str | None = None) -> list[dict]:
        sql = "SELECT * FROM sources WHERE project = ?"
        params: list = [project]
        if status:
            sql += " AND status = ?"
            params.append(status)
        return [dict(r) for r in self._conn.execute(sql, params).fetchall()]

    def save_idea_vector(self, idea_id: str, vector: list[float], payload: dict | None = None) -> None:
        self._vs.upsert(COL_IDEAS, id=idea_id, vector=vector, payload=payload or {})

    def search_similar(self, query_vector: list[float], *, limit: int = 10) -> list[dict]:
        return self._vs.search(COL_IDEAS, query_vector, limit=limit)

    def close(self) -> None:
        self._conn.close()
        self._vs.close()
```

**Step 4: Run test to verify it passes**

Run: `python -m pytest ideas/tests/test_store.py -v`
Expected: all 5 tests PASS

**Step 5: Commit**

```bash
git add ideas/store.py ideas/tests/test_store.py
git commit -m "feat(ideas): add SQLite + sqlite-vec store with project isolation"
```

---

## Task 3: Idea Extraction (Stage 0 + Stage 2)

**Files:**
- Create: `ideas/extract.py`
- Create: `ideas/tests/test_extract.py`

**Step 1: Write the failing test**

```python
# ideas/tests/test_extract.py
"""Tests for idea extraction."""
import json
from unittest.mock import AsyncMock, patch

import pytest

from ideas.extract import extract_ideas, _build_extraction_prompt, _parse_extraction_response
from ideas.models import Idea


def test_build_extraction_prompt():
    prompt = _build_extraction_prompt("Some text about AI safety")
    assert "claim" in prompt.lower()
    assert "thesis" in prompt.lower()
    assert "AI safety" in prompt


def test_parse_extraction_response_valid():
    raw = json.dumps({"ideas": [
        {"text": "X causes Y", "type": "claim", "confidence": 0.9, "source_span": [10, 25]},
        {"text": "Framing Z is key", "type": "thesis", "confidence": 0.8, "source_span": [30, 50]},
    ]})
    ideas = _parse_extraction_response(raw, source="user", model="opus")
    assert len(ideas) == 2
    assert ideas[0].idea_type == "claim"
    assert ideas[1].idea_type == "thesis"
    assert ideas[0].confidence == 0.9


def test_parse_extraction_response_bare_list():
    raw = json.dumps([
        {"text": "X causes Y", "type": "claim", "confidence": 0.9},
    ])
    ideas = _parse_extraction_response(raw, source="user", model="opus")
    assert len(ideas) == 1


def test_parse_extraction_response_markdown_fenced():
    raw = "```json\n" + json.dumps({"ideas": [
        {"text": "A", "type": "claim", "confidence": 0.5},
    ]}) + "\n```"
    ideas = _parse_extraction_response(raw, source="user", model="opus")
    assert len(ideas) == 1


@pytest.mark.asyncio
async def test_extract_ideas_calls_llm():
    mock_response = json.dumps({"ideas": [
        {"text": "Test claim", "type": "claim", "confidence": 0.8},
    ]})
    with patch("ideas.extract.call_llm", new_callable=AsyncMock, return_value=mock_response):
        ideas = await extract_ideas("Some document text", source="user", model="sonnet")
    assert len(ideas) == 1
    assert ideas[0].extraction_model == "sonnet"
```

**Step 2: Run test to verify it fails**

Run: `python -m pytest ideas/tests/test_extract.py -v`
Expected: FAIL — ImportError

**Step 3: Write minimal implementation**

```python
# ideas/extract.py
"""Idea extraction from text — Stage 0 (user writing) and Stage 2 (corpus sources)."""
import json
import re

from lib.llm import call_llm

from ideas.models import Idea

_SYSTEM = """You are an expert at identifying ideas in text. Extract two types:

1. **Claims**: Atomic factual assertions, causal arguments, predictions. Specific and potentially falsifiable.
2. **Theses**: Higher-level insights, framings, mental models, surprising connections between domains.

For each idea, provide:
- text: The idea expressed as a clear standalone statement
- type: "claim" or "thesis"
- confidence: 0.0-1.0 how confident you are this is a real idea (not filler)
- source_span: [start_char, end_char] approximate location in the source text

Be thorough — extract ALL substantive ideas. Skip filler, transitions, and obvious statements.
Respond with JSON only: {"ideas": [...]}"""


def _build_extraction_prompt(text: str) -> str:
    return f"Extract all claims and theses from this text:\n\n{text}"


def _parse_extraction_response(raw: str, *, source: str, model: str) -> list[Idea]:
    """Parse LLM extraction response into Idea objects."""
    # Strip markdown fencing
    cleaned = re.sub(r"^```(?:json)?\s*\n?", "", raw.strip())
    cleaned = re.sub(r"\n?```\s*$", "", cleaned)

    data = json.loads(cleaned)
    items = data.get("ideas", data) if isinstance(data, dict) else data

    ideas = []
    for item in items:
        span = item.get("source_span")
        ideas.append(Idea(
            text=item["text"],
            idea_type=item["type"],
            source=source,
            source_span=tuple(span) if span else None,
            extraction_model=model,
            confidence=item.get("confidence", 0.0),
        ))
    return ideas


async def extract_ideas(
    text: str,
    *,
    source: str = "user",
    model: str = "sonnet",
    temperature: float = 0.1,
) -> list[Idea]:
    """Extract ideas from text using LLM."""
    prompt = _build_extraction_prompt(text)
    raw = await call_llm(model=model, prompt=prompt, system=_SYSTEM, temperature=temperature, stream=False)
    return _parse_extraction_response(str(raw), source=source, model=model)
```

**Step 4: Run test to verify it passes**

Run: `python -m pytest ideas/tests/test_extract.py -v`
Expected: all 5 tests PASS

**Step 5: Commit**

```bash
git add ideas/extract.py ideas/tests/test_extract.py
git commit -m "feat(ideas): add idea extraction — claims + theses from text"
```

---

## Task 4: Landscape Search (Stage 1)

**Files:**
- Create: `ideas/landscape.py`
- Create: `ideas/tests/test_landscape.py`

**Step 1: Write the failing test**

```python
# ideas/tests/test_landscape.py
"""Tests for landscape search and source gathering."""
import json
from unittest.mock import AsyncMock, patch

import pytest

from ideas.landscape import generate_search_terms, search_landscape, _parse_search_terms
from ideas.models import Idea, LandscapeSource


def test_parse_search_terms():
    raw = json.dumps({"terms": ["AI safety alignment", "reward hacking", "RLHF failure modes"]})
    terms = _parse_search_terms(raw)
    assert len(terms) == 3
    assert "AI safety alignment" in terms


@pytest.mark.asyncio
async def test_generate_search_terms():
    ideas = [
        Idea(text="RLHF leads to reward hacking", idea_type="claim", source="user"),
        Idea(text="Alignment requires new paradigm", idea_type="thesis", source="user"),
    ]
    mock_resp = json.dumps({"terms": ["RLHF reward hacking", "AI alignment paradigm"]})
    with patch("ideas.landscape.call_llm", new_callable=AsyncMock, return_value=mock_resp):
        terms = await generate_search_terms(ideas)
    assert len(terms) >= 2


@pytest.mark.asyncio
async def test_search_landscape():
    mock_serper = AsyncMock(return_value={
        "organic": [
            {"title": "Paper A", "link": "https://a.com", "snippet": "About X"},
            {"title": "Paper B", "link": "https://b.com", "snippet": "About Y"},
        ]
    })
    with patch("ideas.landscape.serper_search", mock_serper):
        sources = await search_landscape(["test query"], num_per_term=2)
    assert len(sources) == 2
    assert all(isinstance(s, LandscapeSource) for s in sources)
    assert sources[0].status == "pending"


@pytest.mark.asyncio
async def test_search_landscape_deduplicates():
    """Same URL from different searches should appear only once."""
    mock_serper = AsyncMock(return_value={
        "organic": [
            {"title": "Paper A", "link": "https://a.com", "snippet": "About X"},
        ]
    })
    with patch("ideas.landscape.serper_search", mock_serper):
        sources = await search_landscape(["query1", "query2"], num_per_term=5)
    assert len(sources) == 1  # deduplicated
```

**Step 2: Run test to verify it fails**

Run: `python -m pytest ideas/tests/test_landscape.py -v`
Expected: FAIL — ImportError

**Step 3: Write minimal implementation**

```python
# ideas/landscape.py
"""Landscape search and source gathering — Stage 1."""
import json
import re

from lib.discovery_ops import serper_search
from lib.llm import call_llm

from ideas.models import Idea, LandscapeSource

_SEARCH_TERM_SYSTEM = """Generate search terms to find existing work related to these ideas.
For each idea, generate 2-3 search terms that would find:
- Prior art (who has said something similar?)
- Adjacent work (related but different framing)
- Domain-specific terminology variants

Return JSON: {"terms": ["term1", "term2", ...]}
Aim for 5-10 diverse terms total. Include academic, industry, and popular framings."""


def _parse_search_terms(raw: str) -> list[str]:
    cleaned = re.sub(r"^```(?:json)?\s*\n?", "", raw.strip())
    cleaned = re.sub(r"\n?```\s*$", "", cleaned)
    data = json.loads(cleaned)
    return data.get("terms", data) if isinstance(data, dict) else data


async def generate_search_terms(ideas: list[Idea], *, model: str = "flash") -> list[str]:
    """Generate search terms from extracted ideas."""
    ideas_text = "\n".join(f"- [{i.idea_type}] {i.text}" for i in ideas)
    prompt = f"Generate search terms for these ideas:\n\n{ideas_text}"
    raw = await call_llm(model=model, prompt=prompt, system=_SEARCH_TERM_SYSTEM, temperature=0.3, stream=False)
    return _parse_search_terms(str(raw))


async def search_landscape(
    terms: list[str],
    *,
    num_per_term: int = 10,
    search_type: str = "search",
) -> list[LandscapeSource]:
    """Search web for each term, return deduplicated LandscapeSources."""
    seen_urls: set[str] = set()
    sources: list[LandscapeSource] = []

    for term in terms:
        result = await serper_search(term, search_type=search_type, num=num_per_term)
        for item in result.get("organic", []):
            url = item.get("link", "")
            if url and url not in seen_urls:
                seen_urls.add(url)
                sources.append(LandscapeSource(
                    url=url,
                    title=item.get("title", ""),
                    snippet=item.get("snippet", ""),
                    search_term=term,
                ))
    return sources
```

**Step 4: Run test to verify it passes**

Run: `python -m pytest ideas/tests/test_landscape.py -v`
Expected: all 4 tests PASS

**Step 5: Commit**

```bash
git add ideas/landscape.py ideas/tests/test_landscape.py
git commit -m "feat(ideas): add landscape search — term generation + Serper search"
```

---

## Task 5: Corpus Extraction (Stage 2)

**Files:**
- Modify: `ideas/landscape.py` — add `fetch_and_extract_sources`
- Modify: `ideas/tests/test_landscape.py` — add fetch+extract tests

**Step 1: Write the failing test**

```python
# Add to ideas/tests/test_landscape.py

from ideas.landscape import fetch_and_extract_sources


@pytest.mark.asyncio
async def test_fetch_and_extract_sources():
    """Fetch accepted sources, extract ideas from each."""
    sources = [
        LandscapeSource(url="https://a.com", title="A", snippet="...", search_term="q", status="accepted"),
    ]
    mock_fetch = AsyncMock(return_value=("<html><body>Content about X causing Y</body></html>", None, "A"))
    mock_extract = AsyncMock(return_value=[
        Idea(text="X causes Y", idea_type="claim", source="https://a.com"),
    ])
    with patch("ideas.landscape.fetch", mock_fetch), \
         patch("ideas.landscape.extract_ideas", mock_extract):
        ideas = await fetch_and_extract_sources(sources)
    assert len(ideas) == 1
    assert ideas[0].source == "https://a.com"
```

**Step 2: Run test to verify it fails**

Run: `python -m pytest ideas/tests/test_landscape.py::test_fetch_and_extract_sources -v`
Expected: FAIL — ImportError

**Step 3: Add implementation to landscape.py**

```python
# Add to ideas/landscape.py — imports at top:
import asyncio
from lib.ingest import fetch
from ideas.extract import extract_ideas

# Add function:
async def fetch_and_extract_sources(
    sources: list[LandscapeSource],
    *,
    model: str = "flash",
    concurrency: int = 5,
) -> list[Idea]:
    """Fetch accepted sources and extract ideas from each (Stage 2)."""
    accepted = [s for s in sources if s.status == "accepted"]
    sem = asyncio.Semaphore(concurrency)
    all_ideas: list[Idea] = []

    async def _process(source: LandscapeSource) -> list[Idea]:
        async with sem:
            content, _, _ = await fetch(source.url)
            if content.startswith("Error:"):
                return []
            # Truncate long content
            text = content[:50_000]
            return await extract_ideas(text, source=source.url, model=model)

    tasks = [_process(s) for s in accepted]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    for result in results:
        if isinstance(result, list):
            all_ideas.extend(result)
    return all_ideas
```

**Step 4: Run test to verify it passes**

Run: `python -m pytest ideas/tests/test_landscape.py -v`
Expected: all 5 tests PASS

**Step 5: Commit**

```bash
git add ideas/landscape.py ideas/tests/test_landscape.py
git commit -m "feat(ideas): add corpus extraction — fetch sources + extract ideas"
```

---

## Task 6: Evaluation Engine — Batched Scoring (Stage 3)

**Files:**
- Create: `ideas/eval.py`
- Create: `ideas/tests/test_eval.py`

This is the core of the system — 8-dimension evaluation with batched LLM calls.

**Step 1: Write the failing test**

```python
# ideas/tests/test_eval.py
"""Tests for idea evaluation engine."""
import json
from unittest.mock import AsyncMock, patch

import pytest

from ideas.eval import (
    evaluate_idea,
    _build_eval_prompt,
    _parse_eval_response,
    EVAL_BATCHES,
)
from ideas.models import Idea, IdeaScore, EvaluationProfile, DIMENSIONS


def test_eval_batches_cover_all_dimensions():
    """Every dimension appears in exactly one batch."""
    all_dims = [d for batch in EVAL_BATCHES for d in batch]
    assert set(all_dims) == set(DIMENSIONS.keys())
    assert len(all_dims) == len(set(all_dims))  # no duplicates


def test_build_eval_prompt():
    idea = Idea(text="X causes Y because Z", idea_type="claim", source="user")
    prompt = _build_eval_prompt(idea, ["claim_precision", "internal_coherence"], corpus_summary="Prior work on X")
    assert "X causes Y" in prompt
    assert "claim_precision" in prompt
    assert "Prior work on X" in prompt


def test_parse_eval_response():
    raw = json.dumps({
        "claim_precision": {"score": 7, "rationale": "Specific mechanism identified"},
        "internal_coherence": {"score": 6, "rationale": "Chain mostly complete"},
    })
    scores = _parse_eval_response(raw, idea_id="abc", model="opus")
    assert len(scores) == 2
    assert scores[0].dimension == "claim_precision"
    assert scores[0].score == 7.0
    assert scores[1].dimension == "internal_coherence"


@pytest.mark.asyncio
async def test_evaluate_idea_calls_all_batches():
    """Should make one LLM call per batch."""
    call_count = 0

    async def mock_llm(**kwargs):
        nonlocal call_count
        call_count += 1
        # Return scores for whatever dimensions are in the prompt
        scores = {}
        for dim in DIMENSIONS:
            if dim in kwargs.get("prompt", ""):
                scores[dim] = {"score": 7, "rationale": "Good"}
        return json.dumps(scores)

    idea = Idea(text="Test idea", idea_type="claim", source="user")
    with patch("ideas.eval.call_llm", side_effect=mock_llm):
        profile = await evaluate_idea(idea, model="opus")

    assert call_count == len(EVAL_BATCHES)
    assert isinstance(profile, EvaluationProfile)
    assert len(profile.scores) == 8  # all dimensions scored
```

**Step 2: Run test to verify it fails**

Run: `python -m pytest ideas/tests/test_eval.py -v`
Expected: FAIL — ImportError

**Step 3: Write minimal implementation**

```python
# ideas/eval.py
"""Idea evaluation engine — 8-dimension batched scoring (Stage 3)."""
import asyncio
import json
import re

from lib.llm import call_llm

from ideas.models import Idea, IdeaScore, EvaluationProfile, DIMENSIONS

# Batches of 2-3 dimensions per LLM call (research-backed sweet spot).
# Grouped by evaluation similarity for prompt coherence.
EVAL_BATCHES: list[list[str]] = [
    ["claim_precision", "internal_coherence", "evidential_grounding"],  # substance
    ["semantic_novelty", "framing_novelty"],                            # novelty
    ["clarity", "rhetorical_force"],                                    # expression
    ["generativity", "composability"],                                  # fertility
]

_SYSTEM = """You are an expert evaluator of ideas and arguments. Score each dimension on a 1-10 scale.
Be precise and calibrated — use the full range. A score of 5 means average, not "I don't know."
Provide a 1-2 sentence rationale for each score.
Respond with JSON only: {"dimension_name": {"score": N, "rationale": "..."}, ...}"""


def _build_rubric_section(dimensions: list[str]) -> str:
    lines = []
    for dim in dimensions:
        lines.append(f"**{dim}**: {DIMENSIONS[dim]}")
    return "\n".join(lines)


def _build_eval_prompt(
    idea: Idea,
    dimensions: list[str],
    *,
    corpus_summary: str = "",
) -> str:
    rubric = _build_rubric_section(dimensions)
    parts = [
        f"Evaluate this {idea.idea_type}:\n\n\"{idea.text}\"",
        f"\nScore on these dimensions (1-10 each):\n{rubric}",
    ]
    if corpus_summary:
        parts.append(f"\nLandscape corpus context (for novelty comparison):\n{corpus_summary}")
    return "\n".join(parts)


def _parse_eval_response(raw: str, *, idea_id: str, model: str) -> list[IdeaScore]:
    cleaned = re.sub(r"^```(?:json)?\s*\n?", "", raw.strip())
    cleaned = re.sub(r"\n?```\s*$", "", cleaned)
    data = json.loads(cleaned)
    scores = []
    for dim, val in data.items():
        if dim in DIMENSIONS:
            scores.append(IdeaScore(
                idea_id=idea_id,
                dimension=dim,
                score=float(val["score"]),
                rationale=val.get("rationale", ""),
                model=model,
            ))
    return scores


async def _evaluate_batch(
    idea: Idea,
    dimensions: list[str],
    *,
    model: str,
    corpus_summary: str = "",
    temperature: float = 0.2,
) -> list[IdeaScore]:
    prompt = _build_eval_prompt(idea, dimensions, corpus_summary=corpus_summary)
    raw = await call_llm(model=model, prompt=prompt, system=_SYSTEM, temperature=temperature, stream=False)
    return _parse_eval_response(str(raw), idea_id=idea.id, model=model)


async def evaluate_idea(
    idea: Idea,
    *,
    model: str = "opus",
    corpus_summary: str = "",
    max_concurrency: int = 4,
) -> EvaluationProfile:
    """Evaluate an idea on all 8 dimensions using batched LLM calls."""
    sem = asyncio.Semaphore(max_concurrency)

    async def _run_batch(dims: list[str]) -> list[IdeaScore]:
        async with sem:
            return await _evaluate_batch(idea, dims, model=model, corpus_summary=corpus_summary)

    batch_results = await asyncio.gather(*[_run_batch(b) for b in EVAL_BATCHES])
    all_scores: list[IdeaScore] = []
    for batch in batch_results:
        all_scores.extend(batch)

    scores_dict = {s.dimension: s.score for s in all_scores}
    return EvaluationProfile.from_scores(scores_dict)
```

**Step 4: Run test to verify it passes**

Run: `python -m pytest ideas/tests/test_eval.py -v`
Expected: all 4 tests PASS

**Step 5: Commit**

```bash
git add ideas/eval.py ideas/tests/test_eval.py
git commit -m "feat(ideas): add 8-dimension batched evaluation engine"
```

---

## Task 7: Multi-Model Consensus for Hard Dimensions

**Files:**
- Modify: `ideas/eval.py` — add `evaluate_idea_consensus`
- Modify: `ideas/tests/test_eval.py` — add consensus tests

**Step 1: Write the failing test**

```python
# Add to ideas/tests/test_eval.py
from ideas.eval import evaluate_idea_consensus
from statistics import median


@pytest.mark.asyncio
async def test_evaluate_idea_consensus_uses_multiple_models():
    """Hard dimensions (fertility, novelty, coherence) get multi-model scoring."""
    calls = []

    async def mock_llm(**kwargs):
        model = kwargs.get("model", "")
        calls.append(model)
        scores = {}
        for dim in DIMENSIONS:
            if dim in kwargs.get("prompt", ""):
                scores[dim] = {"score": 7, "rationale": "Good"}
        return json.dumps(scores)

    idea = Idea(text="Test idea", idea_type="claim", source="user")
    with patch("ideas.eval.call_llm", side_effect=mock_llm):
        profile = await evaluate_idea_consensus(idea, models=["opus", "sonnet", "gemini"])

    assert isinstance(profile, EvaluationProfile)
    # Should have called multiple models
    assert len(set(calls)) >= 2
```

**Step 2: Run test to verify it fails**

Run: `python -m pytest ideas/tests/test_eval.py::test_evaluate_idea_consensus_uses_multiple_models -v`
Expected: FAIL — ImportError

**Step 3: Add implementation**

```python
# Add to ideas/eval.py
from statistics import median as _median

# Batches that get multi-model consensus (hardest to score)
CONSENSUS_BATCHES = [
    ["semantic_novelty", "framing_novelty"],
    ["generativity", "composability"],
]
# Batches that single-model is fine for
SINGLE_BATCHES = [
    ["claim_precision", "internal_coherence", "evidential_grounding"],
    ["clarity", "rhetorical_force"],
]


async def evaluate_idea_consensus(
    idea: Idea,
    *,
    models: list[str] = ("opus", "sonnet", "gemini"),
    corpus_summary: str = "",
) -> EvaluationProfile:
    """Evaluate with multi-model consensus for hard dimensions, single model for easy ones."""
    all_scores: dict[str, float] = {}
    all_score_objs: list[IdeaScore] = []

    # Single-model batches (first model in list)
    primary = models[0]
    single_tasks = [
        _evaluate_batch(idea, dims, model=primary, corpus_summary=corpus_summary)
        for dims in SINGLE_BATCHES
    ]

    # Multi-model consensus batches
    consensus_tasks = []
    for dims in CONSENSUS_BATCHES:
        for model in models:
            consensus_tasks.append(
                _evaluate_batch(idea, dims, model=model, corpus_summary=corpus_summary)
            )

    all_results = await asyncio.gather(*(single_tasks + consensus_tasks), return_exceptions=True)

    # Process single-model results
    for result in all_results[:len(single_tasks)]:
        if isinstance(result, list):
            for s in result:
                all_scores[s.dimension] = s.score
                all_score_objs.append(s)

    # Process consensus results — take median per dimension
    consensus_dim_scores: dict[str, list[float]] = {}
    consensus_dim_rationales: dict[str, list[str]] = {}
    for result in all_results[len(single_tasks):]:
        if isinstance(result, list):
            for s in result:
                consensus_dim_scores.setdefault(s.dimension, []).append(s.score)
                consensus_dim_rationales.setdefault(s.dimension, []).append(s.rationale)

    for dim, scores in consensus_dim_scores.items():
        all_scores[dim] = _median(scores)

    return EvaluationProfile.from_scores(all_scores)
```

**Step 4: Run test to verify it passes**

Run: `python -m pytest ideas/tests/test_eval.py -v`
Expected: all 5 tests PASS

**Step 5: Commit**

```bash
git add ideas/eval.py ideas/tests/test_eval.py
git commit -m "feat(ideas): add multi-model consensus scoring for hard dimensions"
```

---

## Task 8: Gradio UI — Interactive Pipeline

**Files:**
- Create: `ideas/ui/__init__.py`
- Create: `ideas/ui/app.py`

This task is UI-heavy. Write the app, then verify visually.

**Step 1: Write the UI app**

```python
# ideas/ui/__init__.py
# (empty)

# ideas/ui/app.py
#!/usr/bin/env python
"""Idea Evaluation System — interactive 4-stage pipeline UI."""
import asyncio
import os
from pathlib import Path

import gradio as gr

from lib.gradio.utils import TAB_JS, emoji_favicon, install_hup_handler, project_emoji

install_hup_handler()

DATA_DIR = Path(__file__).parent.parent / "data"

# Lazy store initialization
_store = None


def _get_store():
    global _store
    if _store is None:
        from ideas.store import IdeaStore
        DATA_DIR.mkdir(parents=True, exist_ok=True)
        _store = IdeaStore(db_path=DATA_DIR / "ideas.db", vector_path=DATA_DIR / "vectors")
    return _store


async def _run_extraction(text: str, project: str):
    """Stage 0: Extract ideas from user's writing."""
    from ideas.extract import extract_ideas
    if not text.strip():
        return "Please paste your writing above."
    ideas = await extract_ideas(text, source="user", model="sonnet")
    store = _get_store()
    for idea in ideas:
        store.save_idea(idea, project=project)
    claims = [i for i in ideas if i.idea_type == "claim"]
    theses = [i for i in ideas if i.idea_type == "thesis"]
    lines = [f"## Extracted {len(ideas)} ideas\n"]
    if claims:
        lines.append(f"### Claims ({len(claims)})")
        for c in claims:
            lines.append(f"- {c.text} *(conf: {c.confidence:.1f})*")
    if theses:
        lines.append(f"\n### Theses ({len(theses)})")
        for t in theses:
            lines.append(f"- {t.text} *(conf: {t.confidence:.1f})*")
    return "\n".join(lines)


def run_extraction(text, project):
    return asyncio.run(_run_extraction(text, project))


async def _run_landscape_search(project: str):
    """Stage 1: Generate search terms and find sources."""
    from ideas.landscape import generate_search_terms, search_landscape
    from ideas.models import Idea
    store = _get_store()
    idea_rows = store.get_ideas(project=project, source="user")
    if not idea_rows:
        return "No ideas extracted yet. Run Stage 0 first.", []
    ideas = [Idea(text=r["text"], idea_type=r["idea_type"], source="user") for r in idea_rows]
    terms = await generate_search_terms(ideas)
    sources = await search_landscape(terms)
    for src in sources:
        store.save_source(src, project=project)
    rows = [[s.title, s.url, s.snippet, s.search_term, "pending"] for s in sources]
    summary = f"Found {len(sources)} sources from {len(terms)} search terms:\n" + "\n".join(f"- {t}" for t in terms)
    return summary, rows


def run_landscape_search(project):
    summary, rows = asyncio.run(_run_landscape_search(project))
    return summary, rows


def update_source_status(url, status, project):
    store = _get_store()
    store.update_source_status(url, status, project=project)
    return f"Updated {url} → {status}"


async def _run_evaluation(project: str, use_consensus: bool):
    """Stage 3: Evaluate user's ideas."""
    from ideas.eval import evaluate_idea, evaluate_idea_consensus
    from ideas.models import Idea, GROUPS
    store = _get_store()
    idea_rows = store.get_ideas(project=project, source="user")
    if not idea_rows:
        return "No ideas to evaluate. Run Stage 0 first."
    results = []
    for row in idea_rows:
        idea = Idea(text=row["text"], idea_type=row["idea_type"], source="user", id=row["id"])
        if use_consensus:
            profile = await evaluate_idea_consensus(idea)
        else:
            profile = await evaluate_idea(idea)
        for dim, score in profile.scores.items():
            from ideas.models import IdeaScore
            store.save_score(IdeaScore(idea_id=idea.id, dimension=dim, score=score, rationale="", model="eval"), project=project)
        results.append((idea, profile))

    lines = []
    for idea, profile in results:
        lines.append(f"### {idea.idea_type.title()}: {idea.text[:80]}...")
        lines.append("")
        for group, dims in GROUPS.items():
            dim_scores = " | ".join(f"{d}: {profile.scores.get(d, 0):.0f}" for d in dims)
            lines.append(f"**{group.title()}** (floor: {profile.floors[group]:.0f}): {dim_scores}")
        lines.append("")
    return "\n".join(lines)


def run_evaluation(project, use_consensus):
    return asyncio.run(_run_evaluation(project, use_consensus))


def build_ideas_tab():
    tabs = gr.Tabs(elem_id="main-tabs")
    with tabs:
        with gr.Tab("Extract", id="extract"):
            gr.Markdown("## Stage 0: Extract Ideas from Your Writing")
            project = gr.Textbox(label="Project name", value="default", scale=1)
            text_input = gr.Textbox(label="Paste your writing", lines=15, placeholder="Paste your essay, paper, or article here...")
            extract_btn = gr.Button("Extract Ideas", variant="primary")
            extract_output = gr.Markdown()
            extract_btn.click(run_extraction, [text_input, project], [extract_output])

        with gr.Tab("Landscape", id="landscape"):
            gr.Markdown("## Stage 1: Build Landscape Corpus")
            project_l = gr.Textbox(label="Project name", value="default")
            search_btn = gr.Button("Search for Related Work", variant="primary")
            search_summary = gr.Markdown()
            source_table = gr.Dataframe(
                headers=["Title", "URL", "Snippet", "Search Term", "Status"],
                interactive=False,
            )
            search_btn.click(run_landscape_search, [project_l], [search_summary, source_table])

        with gr.Tab("Evaluate", id="evaluate"):
            gr.Markdown("## Stage 3: Score Your Ideas")
            project_e = gr.Textbox(label="Project name", value="default")
            consensus_cb = gr.Checkbox(label="Multi-model consensus (slower, more accurate)", value=False)
            eval_btn = gr.Button("Evaluate Ideas", variant="primary")
            eval_output = gr.Markdown()
            eval_btn.click(run_evaluation, [project_e, consensus_cb], [eval_output])

    return tabs


if __name__ == "__main__":
    port = int(os.environ.get("GRADIO_SERVER_PORT", 7990))
    favicon = emoji_favicon(project_emoji("ideas", "\U0001f4a1"))
    with gr.Blocks(title="Ideas", fill_width=True) as app:
        gr.Markdown("# \U0001f4a1 Idea Evaluation")
        tabs = build_ideas_tab()

        def _on_load(request: gr.Request):
            tab_id = request.query_params.get("tab")
            return gr.Tabs(selected=tab_id) if tab_id else gr.skip()

        app.load(_on_load, outputs=[tabs])
    app.launch(server_port=port, js=TAB_JS, head=favicon)
```

**Step 2: Test it launches**

Run: `cd /Users/tchklovski/all-code/rivus && GRADIO_SERVER_PORT=7990 timeout 5 python ideas/ui/app.py || true`
Expected: Should start without import errors (may timeout — that's OK)

**Step 3: Commit**

```bash
git add ideas/ui/__init__.py ideas/ui/app.py
git commit -m "feat(ideas): add Gradio UI — extract, landscape, evaluate tabs"
```

---

## Task 9: Server Registration

**Files:**
- Modify: `infra/Caddyfile` — add kb.localhost/ideas reverse proxy block
- Modify: `index/registry.py` — add ideas to SERVICE_METADATA
- Add emoji to `~/.config/rivus/project_emojis.json`

**Step 1: Read current files**

Read `infra/Caddyfile` and `index/registry.py` to find exact insertion points.

**Step 2: Add Caddy block**

Add after the `draft.localhost` block:
```
# 💡 ideas — idea evaluation
kb.localhost/ideas {
    reverse_proxy localhost:7990
    import tls90d
    # Start: gradio ideas/ui/app.py
}
```

Update comment: `Next available UI port: 8000`

**Step 3: Add to registry**

Add to `SERVICE_METADATA`:
```python
"ideas": {
    "emoji": "💡",
    "desc": "Idea evaluation — extract, landscape, score",
    "cmd": "gradio ideas/ui/app.py",
    "subs": ["Extract", "Landscape", "Evaluate"],
},
```

**Step 4: Add project emoji**

```bash
python -c "
import json; p = Path('~/.config/rivus/project_emojis.json').expanduser()
d = json.loads(p.read_text()); d['ideas'] = '💡'; p.write_text(json.dumps(d, indent=2, ensure_ascii=False) + '\n')
" 2>/dev/null || echo '{"ideas": "💡"}' | python -c "import sys,json; print(json.dumps(json.load(sys.stdin), indent=2))"
```

**Step 5: Reload Caddy and commit**

```bash
caddy reload --config /opt/homebrew/etc/Caddyfile
git add infra/Caddyfile index/registry.py
git commit -m "feat(ideas): register server — kb.localhost/ideas:7990"
```

---

## Task 10: Integration Smoke Test

**Files:**
- Create: `ideas/tests/test_integration.py`

**Step 1: Write integration test**

```python
# ideas/tests/test_integration.py
"""Integration smoke test — full pipeline with mocks."""
import json
from unittest.mock import AsyncMock, patch

import pytest

from ideas.models import Idea, EvaluationProfile
from ideas.extract import extract_ideas
from ideas.landscape import generate_search_terms, search_landscape
from ideas.eval import evaluate_idea


SAMPLE_TEXT = """
Reinforcement learning from human feedback (RLHF) systematically incentivizes
models to produce outputs that appear helpful rather than outputs that are genuinely
helpful. This is because human raters cannot reliably distinguish between the two
in the time they have to evaluate. The fundamental problem is not the reward model
but the evaluation bottleneck — humans are too slow to verify complex reasoning.
"""


@pytest.mark.asyncio
async def test_full_pipeline_with_mocks():
    """End-to-end: extract → search terms → search → evaluate."""
    # Stage 0: Extract
    extract_resp = json.dumps({"ideas": [
        {"text": "RLHF incentivizes appearing helpful over being helpful", "type": "thesis", "confidence": 0.9},
        {"text": "Human raters cannot distinguish genuine from apparent helpfulness", "type": "claim", "confidence": 0.85},
    ]})
    with patch("ideas.extract.call_llm", new_callable=AsyncMock, return_value=extract_resp):
        ideas = await extract_ideas(SAMPLE_TEXT)
    assert len(ideas) == 2

    # Stage 1: Search terms
    terms_resp = json.dumps({"terms": ["RLHF reward hacking", "evaluation bottleneck AI"]})
    with patch("ideas.landscape.call_llm", new_callable=AsyncMock, return_value=terms_resp):
        terms = await generate_search_terms(ideas)
    assert len(terms) == 2

    # Stage 1: Search
    serper_resp = {"organic": [{"title": "Paper", "link": "https://ex.com", "snippet": "About RLHF"}]}
    with patch("ideas.landscape.serper_search", new_callable=AsyncMock, return_value=serper_resp):
        sources = await search_landscape(terms)
    assert len(sources) >= 1

    # Stage 3: Evaluate
    eval_resp = json.dumps({d: {"score": 7, "rationale": "Good"} for d in [
        "claim_precision", "internal_coherence", "evidential_grounding",
        "semantic_novelty", "framing_novelty", "clarity", "rhetorical_force",
        "generativity", "composability",
    ]})
    with patch("ideas.eval.call_llm", new_callable=AsyncMock, return_value=eval_resp):
        profile = await evaluate_idea(ideas[0])
    assert isinstance(profile, EvaluationProfile)
    assert all(v == 7.0 for v in profile.floors.values())
```

**Step 2: Run integration test**

Run: `python -m pytest ideas/tests/test_integration.py -v`
Expected: PASS

**Step 3: Run all tests**

Run: `python -m pytest ideas/tests/ -v`
Expected: all tests PASS

**Step 4: Commit**

```bash
git add ideas/tests/test_integration.py
git commit -m "test(ideas): add integration smoke test — full pipeline with mocks"
```

---

## Summary

| Task | Component | Tests |
|------|-----------|-------|
| 1    | Data models (models.py) | 6 tests |
| 2    | SQLite + sqlite-vec store | 5 tests |
| 3    | Idea extraction (Stage 0/2) | 5 tests |
| 4    | Landscape search (Stage 1) | 4 tests |
| 5    | Corpus extraction (Stage 2) | 1 test |
| 6    | Batched evaluation (Stage 3) | 4 tests |
| 7    | Multi-model consensus | 1 test |
| 8    | Gradio UI | manual verify |
| 9    | Server registration | manual verify |
| 10   | Integration smoke test | 1 test |

**Total: 27+ tests, 10 commits**

### Key reuse points
- `lib/vectors.VectorStore` for sqlite-vec (NOT Qdrant — semanticnet migrated)
- `lib/llm.call_llm` with model aliases (`opus`, `sonnet`, `flash`, `gemini`)
- `lib/discovery_ops.serper_search` for web search
- `lib/ingest.fetch` for URL content fetching
- `lib/gradio.utils` for Gradio app boilerplate
- Batched eval pattern from `draft/style/evaluate.py` (2-3 dims per call)
- Floor-per-group aggregation (not weighted average)
