# VC Intel Restructure — Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Move VC firm data from `people.db` into the unified companies registry (`companies.db`) with dedicated `vc_firm_profile` and `vc_firm_observations` tables.

**Architecture:** VC firms become regular companies tagged `vc-firm`, with typed canonical data in `vc_firm_profile` and per-source provenance in `vc_firm_observations`. The job handler writes to companies.db for firms and people.db for VC people. Scoring logic moves to `intel/companies/vc_scoring.py`.

**Tech Stack:** SQLite, Python, existing `intel.companies.registry` API

**Design doc:** `docs/plans/2026-03-06-vc-intel-restructure-design.md`

---

### Task 1: Fix `ensure_company()` to support private companies

**Files:**
- Modify: `intel/companies/registry.py:23-79`
- Test: `intel/companies/tests/test_registry.py` (create if needed)

**Step 1: Write the failing test**

```python
# intel/companies/tests/test_registry.py
import sqlite3
import sys
from pathlib import Path

import pytest

_RIVUS_ROOT = Path(__file__).resolve().parent.parent.parent.parent
if str(_RIVUS_ROOT) not in sys.path:
    sys.path.insert(0, str(_RIVUS_ROOT))

from intel.companies.registry import ensure_company, tag_company, get_companies_by_tag, get_tags


@pytest.fixture
def db():
    """In-memory DB with companies schema."""
    conn = sqlite3.connect(":memory:")
    conn.row_factory = sqlite3.Row
    conn.executescript("""
        CREATE TABLE companies (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            name TEXT NOT NULL UNIQUE,
            ticker TEXT,
            symbol TEXT,
            exchange TEXT,
            country TEXT,
            market_cap_m REAL,
            is_public INTEGER DEFAULT 1,
            website TEXT,
            description TEXT,
            canonical_name TEXT,
            sector TEXT,
            industry TEXT,
            created_at TEXT DEFAULT (datetime('now')),
            updated_at TEXT DEFAULT (datetime('now'))
        );
        CREATE TABLE company_tags (
            company_id INTEGER NOT NULL REFERENCES companies(id),
            tag TEXT NOT NULL,
            added_at TEXT DEFAULT (datetime('now')),
            metadata TEXT,
            PRIMARY KEY (company_id, tag)
        );
        CREATE INDEX idx_company_tags_tag ON company_tags(tag);
    """)
    return conn


def test_ensure_company_private(db):
    """Private company gets is_public=0."""
    cid = ensure_company(db, name="Sequoia Capital", is_public=0, country="US")
    row = db.execute("SELECT is_public, country FROM companies WHERE id = ?", (cid,)).fetchone()
    assert row["is_public"] == 0
    assert row["country"] == "US"


def test_ensure_company_with_website(db):
    """Website is stored on insert."""
    cid = ensure_company(db, name="Benchmark", website="benchmark.com", is_public=0)
    row = db.execute("SELECT website FROM companies WHERE id = ?", (cid,)).fetchone()
    assert row["website"] == "benchmark.com"


def test_ensure_company_default_public(db):
    """Default is_public=1 for backward compat."""
    cid = ensure_company(db, name="NVIDIA", ticker="NVDA")
    row = db.execute("SELECT is_public FROM companies WHERE id = ?", (cid,)).fetchone()
    assert row["is_public"] == 1


def test_ensure_company_update_preserves_is_public(db):
    """Updating existing company doesn't flip is_public."""
    cid = ensure_company(db, name="Sequoia Capital", is_public=0)
    ensure_company(db, name="Sequoia Capital", sector="Venture Capital")
    row = db.execute("SELECT is_public, sector FROM companies WHERE id = ?", (cid,)).fetchone()
    assert row["is_public"] == 0
    assert row["sector"] == "Venture Capital"
```

**Step 2: Run test to verify it fails**

Run: `python -m pytest intel/companies/tests/test_registry.py -v`
Expected: FAIL — `ensure_company()` doesn't accept `is_public` or `website`

**Step 3: Update `ensure_company()` in `registry.py`**

Add `is_public: int = 1` and `website: str | None = None` parameters. Update INSERT to use the parameter. Update the UPDATE branch to handle `website` and avoid overwriting `is_public`.

```python
def ensure_company(conn: sqlite3.Connection, *, name: str, ticker: str | None = None,
                   sector: str | None = None, industry: str | None = None,
                   market_cap_m: float | None = None, country: str | None = None,
                   description: str | None = None, website: str | None = None,
                   is_public: int = 1) -> int:
```

INSERT line changes from hardcoded `1` to the `is_public` parameter. Add `website` to both INSERT and UPDATE.

**Step 4: Run tests to verify they pass**

Run: `python -m pytest intel/companies/tests/test_registry.py -v`
Expected: PASS

**Step 5: Commit**

```bash
git add intel/companies/registry.py intel/companies/tests/test_registry.py
git commit -m "feat(registry): add is_public and website params to ensure_company"
```

---

### Task 2: Create VC schema (vc_firm_profile + vc_firm_observations)

**Files:**
- Modify: `intel/companies/common.py` — add DDL to `init_db()`
- Test: `intel/companies/tests/test_registry.py` — add schema test

**Step 1: Write the failing test**

```python
def test_vc_schema_exists(db_with_vc_schema):
    """vc_firm_profile and vc_firm_observations tables exist."""
    tables = [r[0] for r in db_with_vc_schema.execute(
        "SELECT name FROM sqlite_master WHERE type='table'").fetchall()]
    assert "vc_firm_profile" in tables
    assert "vc_firm_observations" in tables
```

**Step 2: Run test to verify it fails**

Run: `python -m pytest intel/companies/tests/test_registry.py::test_vc_schema_exists -v`
Expected: FAIL

**Step 3: Add DDL to `init_db()` in `common.py`**

Add the `vc_firm_profile` and `vc_firm_observations` CREATE TABLE statements from the design doc to `init_db()`.

**Step 4: Run test to verify it passes**

Run: `python -m pytest intel/companies/tests/test_registry.py -v`
Expected: PASS

**Step 5: Run DDL against real DB to create tables**

```bash
python -c "from intel.companies.common import init_db; init_db()"
```

**Step 6: Commit**

```bash
git add intel/companies/common.py intel/companies/tests/test_registry.py
git commit -m "feat(registry): add vc_firm_profile and vc_firm_observations tables"
```

---

### Task 3: Create VC scoring module

**Files:**
- Create: `intel/companies/vc_scoring.py`
- Test: `intel/companies/tests/test_vc_scoring.py`

**Step 1: Write the failing test**

```python
# intel/companies/tests/test_vc_scoring.py
from intel.companies.vc_scoring import score_vc_firm


def test_score_sequoia():
    """Top-tier SV firm with Midas presence scores high."""
    firm = {
        "city": "Menlo Park", "state": "CA", "country": "US",
        "focus_tags": "ai,software", "stages": "Seed,A,B,Growth",
        "israel_ops": 0, "fund_size_usd": 2_000_000_000,
        "deal_count": 300, "has_midas": 1,
    }
    score, breakdown = score_vc_firm(firm)
    assert score >= 80  # US + SV + AI + SW + Series A + fund>1B + prominence + Midas
    assert "Silicon Valley" in breakdown
    assert "AI focused" in breakdown


def test_score_unknown_firm():
    """Unknown firm with no data scores 0."""
    firm = {"city": "", "state": "", "country": "", "focus_tags": "",
            "stages": "", "israel_ops": 0, "fund_size_usd": 0,
            "deal_count": 0, "has_midas": 0}
    score, breakdown = score_vc_firm(firm)
    assert score == 0
    assert breakdown == {}


def test_score_israel_firm():
    """Israel VC gets Israel bonus."""
    firm = {"city": "Tel Aviv", "state": "", "country": "IL",
            "focus_tags": "cyber,software", "stages": "A,B",
            "israel_ops": 1, "fund_size_usd": 0, "deal_count": 30, "has_midas": 0}
    score, breakdown = score_vc_firm(firm)
    assert "Israel" in breakdown
    assert "Software focused" in breakdown
```

**Step 2: Run tests to verify they fail**

Run: `python -m pytest intel/companies/tests/test_vc_scoring.py -v`
Expected: FAIL — module doesn't exist

**Step 3: Create `vc_scoring.py`**

Move `score_firm()` logic from `build_vc_list.py` into `score_vc_firm(firm: dict) -> tuple[int, dict]`. Include the city sets (BAY_AREA_CITIES, LA_CITIES) and keyword sets (AI_KEYWORDS, SOFTWARE_KEYWORDS). Input is a dict with `vc_firm_profile` column names. Add `score_version` computation (hash of function source).

Key difference from old code: `has_midas` is a field on the input dict (denormalized), not a separate lookup.

**Step 4: Run tests to verify they pass**

Run: `python -m pytest intel/companies/tests/test_vc_scoring.py -v`
Expected: PASS

**Step 5: Commit**

```bash
git add intel/companies/vc_scoring.py intel/companies/tests/test_vc_scoring.py
git commit -m "feat(companies): VC firm scoring module (from build_vc_list.py)"
```

---

### Task 4: Extract seed firms to JSON

**Files:**
- Create: `intel/companies/data/vc_seed_firms.json`
- Source: `intel/people/build_vc_list.py:121-192` (SEED_FIRMS list)

**Step 1: Extract seed firms**

Write a script (or do inline) that converts the `SEED_FIRMS` tuple list from `build_vc_list.py` into a clean JSON array. Each entry: `{"name", "website", "city", "state", "country", "focus_tags", "stages", "israel_ops", "notes"}`. Skip entries with empty website or "skip" in notes.

**Step 2: Verify JSON is valid and has ~60 entries**

```bash
python -c "import json; d=json.load(open('intel/companies/data/vc_seed_firms.json')); print(f'{len(d)} seed firms')"
```
Expected: ~60 seed firms

**Step 3: Commit**

```bash
git add intel/companies/data/vc_seed_firms.json
git commit -m "data(companies): extract 65 curated VC seed firms to JSON"
```

---

### Task 5: Write migration script

**Files:**
- Create: `intel/companies/migrate_vc_firms.py`
- Read: `intel/people/data/people.db` (source)
- Write: `intel/companies/data/companies.db` (target)

**Step 1: Write migration script**

The script should:

1. **Dry-run collision check**: For each unique VC firm name, check if `ensure_company()` would match an existing public company. Print all collisions for review. Abort if `--dry-run` flag.

2. **Migrate seed firms first** (highest quality):
   - Load `intel/companies/data/vc_seed_firms.json`
   - `ensure_company(name=..., country=..., website=..., is_public=0)`
   - `tag_company(company_id, "vc-firm")`
   - INSERT into `vc_firm_profile` with curated fields
   - INSERT into `vc_firm_observations` with `source="manual_seed"`

3. **Migrate DB firms** (grouped by `name_normalized`):
   - For each unique firm, merge fields using precedence rules from design
   - `ensure_company(name=..., country=..., website=..., is_public=0)`
   - `tag_company(company_id, "vc-firm")`
   - UPSERT into `vc_firm_profile` — skip fields already set by seed (check `source != 'manual_seed'`)
   - INSERT each source record into `vc_firm_observations`

4. **Migrate vc_people links**:
   - For each `vc_people` row, match `firm_name` to migrated companies
   - Add `company_id` column if missing, write resolved ID
   - Log non-exact matches

5. **Archive**: RENAME `vc_firms` to `vc_firms_archived`

6. **Validation**: Print counts before/after per source, flag any data loss

**Step 2: Run dry-run**

```bash
python intel/companies/migrate_vc_firms.py --dry-run
```
Expected: Collision report (review manually), migration plan summary

**Step 3: Run actual migration**

```bash
python intel/companies/migrate_vc_firms.py
```
Expected: Summary with counts per source, unique firms migrated, vc_people linked

**Step 4: Verify**

```bash
python -c "
import sqlite3
conn = sqlite3.connect('intel/companies/data/companies.db')
c = conn.cursor()
c.execute(\"SELECT COUNT(*) FROM companies c JOIN company_tags ct ON c.id=ct.company_id WHERE ct.tag='vc-firm'\")
print(f'VC firms in registry: {c.fetchone()[0]}')
c.execute('SELECT COUNT(*) FROM vc_firm_profile')
print(f'VC firm profiles: {c.fetchone()[0]}')
c.execute('SELECT source, COUNT(*) FROM vc_firm_observations GROUP BY source')
for r in c.fetchall(): print(f'  {r[0]}: {r[1]}')
c.execute('SELECT COUNT(*) FROM vc_firm_profile WHERE score > 0')
print(f'Scored firms: {c.fetchone()[0]}')
conn.close()
"
```

**Step 5: Commit**

```bash
git add intel/companies/migrate_vc_firms.py
git commit -m "feat(companies): migrate VC firms from people.db to companies registry"
```

---

### Task 6: Update job handler to write to companies registry

**Files:**
- Modify: `jobs/handlers/vc_data.py`

**Step 1: Fix DATA_DIR and imports**

- Change `DATA_DIR` to point to `intel/people/data` (for people only)
- Add `COMPANIES_DB_PATH` pointing to `intel/companies/data/companies.db`
- Import `ensure_company`, `tag_company` from `intel.companies.registry`
- Import `score_vc_firm` from `intel.companies.vc_scoring`

**Step 2: Update `_init_schema()`**

- Remove `CREATE TABLE vc_firms` DDL
- Add `CREATE TABLE IF NOT EXISTS vc_firm_profile` and `vc_firm_observations` DDL (or import from `init_db()`)
- Keep `vc_people` DDL but add `company_id INTEGER` column

**Step 3: Update each `_fetch_*` function**

Each fetch function currently does `INSERT INTO vc_firms`. Change to:
- `ensure_company(conn_companies, name=..., country=..., is_public=0, website=...)`
- `tag_company(conn_companies, company_id, "vc-firm")`
- INSERT/UPSERT into `vc_firm_profile`
- INSERT into `vc_firm_observations`

For Forbes Midas: also resolve and store `company_id` on `vc_people` rows.

**Step 4: Wire up `_stage_enrich`**

Replace the stub with actual scoring:
```python
async def _stage_enrich(*, item_key: str, data: dict, job) -> dict | None:
    from intel.companies.vc_scoring import score_vc_firm
    # Load profile, score, update vc_firm_profile with score + breakdown
```

**Step 5: Fix `process_stage` dispatcher**

Register stage functions with the dispatcher (currently broken — functions defined but not wired).

**Step 6: Test with inv**

```bash
inv jobs.test -j vc_data_bulk --stage fetch --dry-run
```

**Step 7: Commit**

```bash
git add jobs/handlers/vc_data.py
git commit -m "feat(jobs): vc_data handler writes to companies registry"
```

---

### Task 7: Update people scoring to read from companies registry

**Files:**
- Modify: `intel/people/scoring.py:32,42-56,214-244`
- Modify: `intel/people/network.py:30,33-45`
- Modify: `intel/people/tests/test_scoring.py:13-16,64-71,81-89`

**Step 1: Update `scoring.py`**

Replace `_load_vc_firms()` (CSV reader) with a function that queries `vc_firm_profile` from `companies.db`:

```python
COMPANIES_DB = Path(__file__).resolve().parent.parent / "companies" / "data" / "companies.db"

def _load_vc_firms() -> list[dict]:
    """Load VC firms from companies registry."""
    if not COMPANIES_DB.is_file():
        return []
    import sqlite3
    conn = sqlite3.connect(str(COMPANIES_DB))
    conn.row_factory = sqlite3.Row
    rows = conn.execute("""
        SELECT c.name, p.score, p.focus_tags as focus, p.stages
        FROM companies c
        JOIN company_tags ct ON c.id = ct.company_id
        JOIN vc_firm_profile p ON c.id = p.company_id
        WHERE ct.tag = 'vc-firm'
    """).fetchall()
    conn.close()
    return [dict(r) for r in rows]
```

Remove `VC_FIRMS_CSV` constant and `csv` import.

**Step 2: Update `network.py`**

Same pattern — replace CSV reader `_load_vc_firm_names()` with companies.db query.

**Step 3: Update tests**

- `test_find_vc_connections` — no change needed (uses passed-in data)
- `test_load_vc_firms` — update to check new source (companies.db, not CSV)
- Remove `csv` import dependency in test

**Step 4: Run tests**

```bash
python -m pytest intel/people/tests/test_scoring.py -v
```
Expected: PASS

**Step 5: Commit**

```bash
git add intel/people/scoring.py intel/people/network.py intel/people/tests/test_scoring.py
git commit -m "refactor(people): read VC firms from companies registry instead of CSV"
```

---

### Task 8: Delete old files and update docs

**Files:**
- Delete: `intel/people/build_vc_list.py`
- Delete: `intel/people/data/vc_firms_scored.csv`
- Delete: `intel/people/data/vc_firms_scored.json`
- Modify: `intel/companies/CLAUDE.md` — add vc-firm tag docs
- Modify: `intel/README.md` — absorb strategic vision from `projects/vc_intel/`
- Modify: `intel/people/README.md` — update to reference companies registry
- Delete: `projects/vc_intel/` (after moving content)

**Step 1: Delete old files**

```bash
trash intel/people/build_vc_list.py
trash intel/people/data/vc_firms_scored.csv
trash intel/people/data/vc_firms_scored.json
```

**Step 2: Update `intel/companies/CLAUDE.md`**

Add section documenting:
- `vc-firm` tag and related tables (`vc_firm_profile`, `vc_firm_observations`)
- `vc_scoring.py` module
- Seed firms JSON file
- Migration script

**Step 3: Update `intel/README.md`**

Move strategic VC intelligence vision from `projects/vc_intel/README.md` into `intel/README.md`. Reference `intel/companies/` for VC firm data and `intel/people/` for founder evaluation.

**Step 4: Update `intel/people/README.md`**

Remove references to `build_vc_list.py` and `vc_firms_scored.csv`. Add note that VC firm data lives in `intel/companies/data/companies.db` with `vc-firm` tag.

**Step 5: Delete `projects/vc_intel/`**

```bash
trash projects/vc_intel/
```

**Step 6: Verify no broken references**

```bash
# Check for any remaining references to deleted files
rg -i "build_vc_list|vc_firms_scored" --type py
```
Expected: No matches (or only in migration script/design docs)

**Step 7: Commit**

```bash
git add -A
git commit -m "cleanup(intel): remove old VC files, update docs, dissolve projects/vc_intel"
```

---

### Task 9: Score all VC firms

**Files:**
- Modify: `intel/companies/migrate_vc_firms.py` (or standalone script)

**Step 1: Run scoring across all migrated firms**

```python
# Can be added to migration script or run standalone
python -c "
import sqlite3, json
from intel.companies.vc_scoring import score_vc_firm

conn = sqlite3.connect('intel/companies/data/companies.db')
conn.row_factory = sqlite3.Row
rows = conn.execute('SELECT * FROM vc_firm_profile').fetchall()
scored = 0
for row in rows:
    firm = dict(row)
    score, breakdown = score_vc_firm(firm)
    conn.execute('''UPDATE vc_firm_profile
        SET score=?, score_breakdown=?, score_version=?, scored_at=datetime('now')
        WHERE company_id=?''',
        (score, json.dumps(breakdown), '1.0', firm['company_id']))
    scored += 1
conn.commit()
print(f'Scored {scored} firms')
conn.execute('SELECT score, COUNT(*) FROM vc_firm_profile GROUP BY score ORDER BY score DESC LIMIT 10')
for r in conn.execute('SELECT score, COUNT(*) FROM vc_firm_profile GROUP BY score ORDER BY score DESC LIMIT 10'):
    print(f'  score={r[0]}: {r[1]} firms')
conn.close()
"
```

**Step 2: Verify top firms look right**

```bash
python -c "
import sqlite3
conn = sqlite3.connect('intel/companies/data/companies.db')
rows = conn.execute('''
    SELECT c.name, p.score, p.city, p.state, p.focus_tags
    FROM companies c JOIN vc_firm_profile p ON c.id=p.company_id
    ORDER BY p.score DESC LIMIT 20
''').fetchall()
for r in rows: print(f'  {r[1]:>3}  {r[0]:<35} {r[2] or \"\":<20} {r[4] or \"\"}')
conn.close()
"
```
Expected: Sequoia, a16z, Benchmark near the top

**Step 3: Commit**

```bash
git commit -m "data(companies): score all VC firms"
```

---

### Task 10: Final verification

**Step 1: Run all affected tests**

```bash
python -m pytest intel/companies/tests/ intel/people/tests/test_scoring.py -v
```

**Step 2: Verify no broken imports**

```bash
python -c "from intel.people.scoring import score_person; print('scoring OK')"
python -c "from intel.people.network import build_network; print('network OK')"
python -c "from intel.companies.vc_scoring import score_vc_firm; print('vc_scoring OK')"
python -c "from intel.companies.registry import ensure_company, tag_company, get_companies_by_tag; print('registry OK')"
```

**Step 3: Verify data integrity**

```bash
python -c "
import sqlite3

# Companies registry
conn = sqlite3.connect('intel/companies/data/companies.db')
c = conn.cursor()
c.execute('SELECT COUNT(*) FROM companies WHERE is_public=0')
print(f'Private companies (VC firms): {c.fetchone()[0]}')
c.execute('SELECT COUNT(*) FROM companies WHERE is_public=1')
print(f'Public companies: {c.fetchone()[0]}')
c.execute('SELECT COUNT(*) FROM vc_firm_profile WHERE score > 0')
print(f'Scored VC firms: {c.fetchone()[0]}')
c.execute('SELECT COUNT(*) FROM vc_firm_observations')
print(f'Source observations: {c.fetchone()[0]}')
conn.close()

# People DB
conn2 = sqlite3.connect('intel/people/data/people.db')
c2 = conn2.cursor()
c2.execute('SELECT COUNT(*) FROM vc_people WHERE company_id IS NOT NULL')
print(f'VC people with company_id: {c2.fetchone()[0]}')
tables = [r[0] for r in c2.execute(\"SELECT name FROM sqlite_master WHERE type='table' AND name='vc_firms'\").fetchall()]
print(f'vc_firms table exists: {\"vc_firms\" in tables}')
print(f'vc_firms_archived exists: {\"vc_firms_archived\" in [r[0] for r in c2.execute(\"SELECT name FROM sqlite_master WHERE type=\\'table\\' AND name=\\'vc_firms_archived\\'\").fetchall()]}')
conn2.close()
"
```

**Step 4: Commit final state**

```bash
git add -A && git status
git commit -m "verify(intel): VC intel restructure complete — all tests pass"
```
