# Cache Consolidation: Flat Files -> ContentStore

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Eliminate the flat file cache (`lib/cache/`) by making all fetch code read/write through `lib/ingest/content_store.py` (backed by `lib/content_store/ContentStore`), then remove the flat file layer.

**Architecture:** The content_store SQLite DB (`lib/ingest/data/content.db`) already receives parallel writes from `fetcher_cache.save_cache_metadata()`. We promote it to the sole cache backend. `fetcher_cache.py` keeps its RFC 9111 TTL logic but loses all flat-file I/O. New `cache_*` functions in `content_store.py` provide the read/write/staleness API that `fetcher.py` needs.

**Tech Stack:** SQLite (WAL), zstandard compression, existing `lib/content_store/ContentStore`

**Key simplifications (from design review):**
- **Keep both hashes** — SHA-256 `content_hash` for exact match (cache validity, dedup), simhash for fuzzy similarity (change detection, near-duplicate scoring). Both are now computed at cache-store time and carried on FetchResult.
- **Disable FTS for URL cache** — `fts_fields=[]`. Nobody calls `content_store.search()`. Re-enable if needed later.
- **No auto-trafilatura** — `store()`/`cache_store()` won't auto-extract text. Callers pass `plain_text` explicitly or leave None.
- **VIC already migrated** — VIC's `html_store.py` uses `lib/content_store/ContentStore` (SQLite + zstd). Separate DB instance with domain-specific schema, but same backend.

---

## Consumer Inventory

| Consumer | File | What it imports from `fetcher_cache` |
|----------|------|--------------------------------------|
| **fetcher.py** | `lib/ingest/fetcher.py` | `url_to_cache_path`, `save_cache_metadata`, `load_cache_metadata`, `is_cache_stale`, `DEFAULT_CACHE_DIR`, `STALE_THRESHOLD`, `compute_simhash`, `compute_simhash_distance`, `get_cached_redirect`, `is_cacheable_status`, `is_permanent_redirect`, `save_redirect`, `update_cache_metadata` |
| **cli.py** | `lib/ingest/cli.py` | `url_to_cache_path`, `load_cache_metadata`, `DEFAULT_CACHE_DIR` |
| **http_cache.py** | `lib/ingest/http_cache.py` | `url_to_cache_path`, `load_cache_metadata`, `save_cache_metadata`, `is_cache_stale`, `DEFAULT_CACHE_DIR` |
| **handlers.py** | `lib/ingest/browser/handlers.py` | `get_cached`, `is_fresh`, `save_response` (via `http_cache.py`) |
| **ui_extract.py** | `vario/ui_extract.py` | `DEFAULT_CACHE_DIR`, `is_cache_stale`, `load_cache_metadata`, `url_to_cache_path` |
| **history.py** | `vario/history.py` | `DEFAULT_CACHE_DIR` (for `CACHE_DIR / "history.yaml"`) |
| **fetchability_tool.py** | `learning/gyms/fetchability/` | Uses `fetch_httpx_full` + own simhash computation for cross-method comparison |
| **fetcher_media.py** | `lib/ingest/fetcher_media.py` | `DEFAULT_CACHE_DIR`, `url_to_cache_path` |

## What Stays in `fetcher_cache.py` (renamed to `cache_policy.py`)

These are pure logic functions with no I/O — they stay:
- `is_cacheable_status`, `get_ttl_for_status` — RFC 9111 status logic
- `is_permanent_redirect`, `is_temporary_redirect` — redirect classification
- `CACHEABLE_*` constants, `TTL_*` constants

## What Gets Removed Entirely

- `compute_simhash`, `compute_simhash_distance` from `fetcher_cache.py` — replaced by `_compute_hashes()` in `fetcher.py` (computes both SHA-256 + simhash)
- Auto-trafilatura extraction in `store()` — callers pass `plain_text` or leave None

## What Moves to `content_store.py`

New functions in `lib/ingest/content_store.py`:
- `cache_lookup(url, respect_ttl=True)` — replaces `cache_path.exists()` + `read_text()` + `load_cache_metadata()` + `is_cache_stale()`
- `cache_store(url, html, ...)` — replaces `cache_path.write_text()` + `save_cache_metadata()`
- `cache_update_meta(url, updates)` — replaces `update_cache_metadata()`
- `cache_store_redirect(url, redirect_to, status_code)` — replaces `save_redirect()`
- `cache_get_redirect(url)` — replaces `get_cached_redirect()`
- `cache_list()` — for CLI `--list` command

## What Gets Deleted

- All flat file read/write code from `fetcher_cache.py` (`url_to_cache_path`, `save_cache_metadata`, `load_cache_metadata`, `update_cache_metadata`, `is_cache_stale`, `save_redirect`, `get_cached_redirect`, `DEFAULT_CACHE_DIR`)
- `lib/ingest/http_cache.py` — replaced entirely by content_store functions
- `lib/ingest/migrate_cache.py` — one-time migration, no longer needed after
- `lib/cache/` directory — all flat files

---

### Task 1: Extend `content_store.py` with cache operations

**Files:**
- Modify: `lib/ingest/content_store.py`
- Test: `lib/ingest/tests/test_content_store.py` (create)

**Step 1: Write tests for the new cache operations**

```python
# lib/ingest/tests/test_content_store.py
"""Tests for content_store cache operations."""
import tempfile
from pathlib import Path
from unittest.mock import patch

import pytest


@pytest.fixture
def tmp_store():
    """Create a content store with a temp DB."""
    with tempfile.TemporaryDirectory() as d:
        db_path = Path(d) / "test_content.db"
        with patch("lib.ingest.content_store.DB_PATH", db_path):
            # Reset singleton
            import lib.ingest.content_store as cs
            cs._store = None
            yield cs
            cs._store = None


def test_cache_store_and_lookup(tmp_store):
    cs = tmp_store
    cs.cache_store(
        url="https://example.com/page",
        html="<html><body>Hello</body></html>",
        plain_text="Hello",
        title="Example",
        status_code=200,
        headers={"content-type": "text/html"},
    )
    result = cs.cache_lookup("https://example.com/page")
    assert result is not None
    assert result["raw_html"] == "<html><body>Hello</body></html>"
    assert result["title"] == "Example"
    assert result["status_code"] == 200


def test_cache_lookup_respects_ttl(tmp_store):
    cs = tmp_store
    cs.cache_store(
        url="https://example.com/stale",
        html="<html>old</html>",
        ttl=0,  # immediately stale
    )
    result = cs.cache_lookup("https://example.com/stale")
    assert result is None  # TTL expired


def test_cache_lookup_miss(tmp_store):
    cs = tmp_store
    result = cs.cache_lookup("https://nonexistent.com")
    assert result is None


def test_cache_update_meta(tmp_store):
    cs = tmp_store
    cs.cache_store(url="https://example.com/u", html="<html>x</html>")
    cs.cache_update_meta("https://example.com/u", fetch_mode="httpx+proxy")
    result = cs.cache_lookup("https://example.com/u", respect_ttl=False)
    assert result["fetch_mode"] == "httpx+proxy"


def test_cache_store_redirect(tmp_store):
    cs = tmp_store
    cs.cache_store_redirect(
        "https://old.com/page",
        "https://new.com/page",
        301,
    )
    target = cs.cache_get_redirect("https://old.com/page")
    assert target == "https://new.com/page"


def test_cache_get_redirect_miss(tmp_store):
    cs = tmp_store
    assert cs.cache_get_redirect("https://no-redirect.com") is None


def test_cache_store_error(tmp_store):
    cs = tmp_store
    cs.cache_store(
        url="https://example.com/404",
        html="Not Found",
        status_code=404,
        ttl=300,
    )
    result = cs.cache_lookup("https://example.com/404")
    assert result is not None
    assert result["status_code"] == 404


def test_cache_list(tmp_store):
    cs = tmp_store
    cs.cache_store(url="https://a.com/1", html="<html>a</html>")
    cs.cache_store(url="https://b.com/2", html="<html>b</html>")
    entries = cs.cache_list()
    assert len(entries) == 2
```

**Step 2: Run tests — verify they fail**

Run: `cd /Users/tchklovski/all-code/rivus && mamba run -n rivu python -m pytest lib/ingest/tests/test_content_store.py -v`
Expected: FAIL — `cache_lookup`, `cache_store`, etc. don't exist yet.

**Step 3: Implement the cache operations in `content_store.py`**

Add these functions to `lib/ingest/content_store.py` after the existing `stats()` function:

```python
def cache_lookup(url: str, respect_ttl: bool = True) -> dict | None:
    """Look up cached content for a URL.

    Returns dict with keys: url, raw_html, plain_text, title, domain,
    content_type, fetch_mode, status_code, fetched_at, ttl_seconds, simhash,
    content_hash, wayback_url, wayback_ts, headers_json.

    Returns None if not found or TTL expired (when respect_ttl=True).
    """
    return _get_store().get(url, respect_ttl=respect_ttl)


def cache_store(
    url: str,
    html: str,
    plain_text: str | None = None,
    title: str | None = None,
    content_type: str | None = None,
    fetch_mode: str | None = None,
    status_code: int | None = None,
    headers: dict | None = None,
    ttl: int = 600,
    simhash: int | None = None,
    content_hash: str | None = None,
    wayback_url: str | None = None,
    wayback_ts: str | None = None,
) -> None:
    """Store fetched content for a URL. Upserts (replaces if exists)."""
    domain = urlparse(url).netloc
    now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
    headers_json = json.dumps(headers, default=str) if headers else None

    _get_store().put(
        url=url,
        domain=domain,
        content_type=content_type,
        raw_html=html,
        plain_text=plain_text,
        title=title,
        fetched_at=now,
        fetch_mode=fetch_mode,
        status_code=status_code,
        headers_json=headers_json,
        ttl_seconds=ttl,
        simhash=simhash,
        content_hash=content_hash,
        wayback_url=wayback_url,
        wayback_ts=wayback_ts,
    )


def cache_update_meta(url: str, **kwargs) -> None:
    """Update metadata fields for an existing cached URL.

    Pass any field as keyword arg: fetch_mode="httpx+proxy", status_code=200, etc.
    """
    existing = _get_store().get(url, respect_ttl=False)
    if not existing:
        return
    # Merge updates into existing row and re-put
    existing.update(kwargs)
    # Re-store — put() does upsert
    _get_store().put(**{k: v for k, v in existing.items() if k in _get_store()._fields})


def cache_store_redirect(url: str, redirect_to: str, status_code: int) -> None:
    """Cache a permanent redirect (301/308)."""
    from lib.ingest.fetcher_cache import get_ttl_for_status

    cache_store(
        url=url,
        html="",
        status_code=status_code,
        ttl=get_ttl_for_status(status_code),
        content_type="redirect",
    )
    cache_update_meta(url, wayback_url=redirect_to)


def cache_get_redirect(url: str) -> str | None:
    """Check if URL has a cached redirect. Returns target URL or None."""
    row = cache_lookup(url)
    if row and row.get("content_type") == "redirect" and row.get("wayback_url"):
        return row["wayback_url"]
    return None


def cache_list(limit: int = 1000) -> list[dict]:
    """List cached entries (for CLI). Returns list of dicts with url, domain, fetched_at, status_code, fetch_mode."""
    db = _get_store()._conn
    rows = db.execute(
        "SELECT url, domain, fetched_at, status_code, fetch_mode, ttl_seconds "
        f"FROM url_content ORDER BY fetched_at DESC LIMIT ?",
        (limit,),
    ).fetchall()
    return [dict(r) for r in rows]
```

Note: `cache_store_redirect` repurposes `wayback_url` field for redirect targets. This avoids schema changes. If we want a dedicated `redirect_to` column later, that's a simple schema evolution.

**Step 4: Run tests — verify they pass**

Run: `cd /Users/tchklovski/all-code/rivus && mamba run -n rivu python -m pytest lib/ingest/tests/test_content_store.py -v`
Expected: All PASS.

**Step 5: Commit**

```bash
git add lib/ingest/content_store.py lib/ingest/tests/test_content_store.py
git commit -m "feat(ingest): add cache_lookup/cache_store operations to content_store"
```

---

### Task 2: Rewire `fetcher.py` to use content_store

**Files:**
- Modify: `lib/ingest/fetcher.py`

This is the biggest change. Three fetch functions have the same pattern:

```python
# OLD (flat file):
cache_path = url_to_cache_path(url, cache_dir)
if not refresh and cache_path.exists():
    meta = load_cache_metadata(cache_path)
    if meta and not is_cache_stale(meta):
        content = cache_path.read_text()
        ...
# write:
cache_path.write_text(content)
meta = save_cache_metadata(cache_path, content, headers, extracted_text, url=url)
```

Replace with:

```python
# NEW (content_store):
from lib.ingest.content_store import cache_lookup, cache_store, cache_update_meta, cache_store_redirect, cache_get_redirect

if not refresh:
    cached = cache_lookup(url)
    if cached:
        content = cached["raw_html"]
        ...
# write:
cache_store(url=final_url, html=content, plain_text=extracted_text,
            title=title, status_code=status_code, headers=headers,
            ttl=get_ttl_for_status(status_code, headers))
```

**Step 1: Update imports in `fetcher.py`**

Remove from the `fetcher_cache` import block:
- `DEFAULT_CACHE_DIR`
- `STALE_THRESHOLD` (unused after this change)
- `url_to_cache_path`
- `load_cache_metadata`
- `save_cache_metadata`
- `update_cache_metadata`
- `is_cache_stale`
- `save_redirect`
- `get_cached_redirect`

Keep these (pure logic, no I/O):
- `is_cacheable_status`
- `is_permanent_redirect`

Add:
```python
from .content_store import cache_lookup, cache_store, cache_update_meta, cache_store_redirect, cache_get_redirect
```

**Step 2: Rewrite `fetch()` (line ~760-889)**

The function signature changes: remove `cache_dir` param, remove `cache_path` from return tuple.

Old signature: `async def fetch(url, refresh=False, cache_dir=DEFAULT_CACHE_DIR, ...) -> tuple[str, Path | None, str]`
New signature: `async def fetch(url, refresh=False, ...) -> tuple[str, str]`

Returns `(content, title)` instead of `(content, cache_path, title)`.

Key changes:
- Cache check: `cache_lookup(url)` instead of `cache_path.exists()` + `read_text()`
- Cache write: `cache_store(url=..., html=content, ...)` instead of `cache_path.write_text()` + `save_cache_metadata()`
- Redirect check: `cache_get_redirect(url)` instead of `get_cached_redirect(url, cache_dir)`
- Redirect save: `cache_store_redirect(...)` instead of `save_redirect(...)` (in `fetch_httpx_full`)
- Remove all `cache_path` references
- Remove `cache_dir` parameter

**Step 3: Rewrite `fetch_escalate()` (line ~900-1100)**

Same pattern:
- Remove `cache_path` variable
- Cache check via `cache_lookup(url)`
- Cache write via `cache_store(url=..., html=content, fetch_mode=mode_name, ...)`
- Remove `_save_to_cache` inner function

**Step 4: Rewrite `fetch_playwright()` (line ~1200-1380)**

Same pattern:
- Remove `cache_dir` parameter
- Remove `cache_path` from `FetchResult`
- Cache check via `cache_lookup(url)`
- Cache write via `cache_store(...)` + `cache_update_meta(...)` for refusal fields

**Step 5: Update `fetch_httpx_full()` (line ~428)**

Replace `save_redirect(original_url, redirect_to, status_code, cache_dir)` with `cache_store_redirect(original_url, redirect_to, status_code)`.

**Step 6: Update all callers of `fetch()` that use `cache_path`**

Search for `cache_path` in return value destructuring:
- `cli.py:236`: `html, cache_path, _ = await fetch_httpx_cached(url, refresh=refresh)` -> `html, _ = await fetch(url, refresh=refresh)`
- `cli.py:912`: `html, _, _ = await fetch_httpx_cached(url)` -> `html, _ = await fetch(url)`
- Any other callers that unpack 3 values

**Step 7: Run existing tests**

Run: `cd /Users/tchklovski/all-code/rivus && mamba run -n rivu python -m pytest lib/ingest/tests/ -v`

**Step 8: Commit**

```bash
git add lib/ingest/fetcher.py lib/ingest/cli.py
git commit -m "refactor(ingest): fetcher reads/writes via content_store instead of flat files"
```

---

### Task 3: Rewire remaining consumers

**Files:**
- Modify: `lib/ingest/http_cache.py`
- Modify: `lib/ingest/browser/handlers.py`
- Modify: `vario/ui_extract.py`
- Modify: `vario/history.py`
- Modify: `lib/ingest/fetcher_media.py`

**Step 1: Replace `http_cache.py` with content_store wrappers**

Rewrite `lib/ingest/http_cache.py`:
```python
"""Simple HTTP cache for browser module.

Thin wrapper around content_store for the browser's get/check/save pattern.
"""
from lib.ingest.content_store import cache_lookup, cache_store


def get_cached(url: str) -> tuple[str | None, dict]:
    """Get cached HTML and metadata for URL."""
    row = cache_lookup(url)
    if not row:
        return None, {}
    return row.get("raw_html"), row


def is_fresh(metadata: dict) -> bool:
    """Always True — cache_lookup already checks TTL."""
    return True


def save_response(url: str, html: str, metadata: dict) -> None:
    """Save HTML to cache."""
    cache_store(url=url, html=html, title=metadata.get("title"))
```

Note: `is_fresh()` always returns True because `cache_lookup()` already respects TTL. The function exists only for API compat with `handlers.py`.

**Step 2: Update `vario/ui_extract.py` imports**

Replace:
```python
from lib.ingest.fetcher_cache import DEFAULT_CACHE_DIR as CACHE_DIR
from lib.ingest.fetcher_cache import is_cache_stale, load_cache_metadata, url_to_cache_path
```
With:
```python
from lib.ingest.content_store import cache_lookup
```

Then update usage — the file uses these for checking if a URL is cached and showing cache status. Replace `url_to_cache_path` + `load_cache_metadata` + `is_cache_stale` calls with `cache_lookup(url)`.

**Step 3: Update `vario/history.py`**

This only uses `DEFAULT_CACHE_DIR` for `CACHE_DIR / "history.yaml"`. Move history file to `lib/ingest/data/history.yaml`:
```python
from pathlib import Path
HISTORY_FILE = Path(__file__).parent.parent / "lib" / "ingest" / "data" / "history.yaml"
```
Or simpler — just hardcode the path since this is the only user.

**Step 4: Update `lib/ingest/fetcher_media.py`**

Remove `from .fetcher_cache import DEFAULT_CACHE_DIR, url_to_cache_path` — these are used for PDF cache paths. Replace with content_store calls if PDFs are cached, or remove caching for media (check if it's actually used).

**Step 5: Verify `handlers.py` still works**

`handlers.py` imports from `http_cache.py` which we rewrote. The API is preserved (`get_cached`, `is_fresh`, `save_response`), so no changes needed in `handlers.py`.

**Step 6: Run tests**

Run: `cd /Users/tchklovski/all-code/rivus && mamba run -n rivu python -m pytest lib/ingest/tests/ vario/tests/ -v`

**Step 7: Commit**

```bash
git add lib/ingest/http_cache.py vario/ui_extract.py vario/history.py lib/ingest/fetcher_media.py
git commit -m "refactor(ingest): rewire all consumers from flat files to content_store"
```

---

### Task 4: Update CLI cache commands

**Files:**
- Modify: `lib/ingest/cli.py`

**Step 1: Rewrite the `cache` CLI command**

The current `cache` command reads flat files. Replace with content_store queries:

```python
from lib.ingest.content_store import cache_lookup, cache_list

@cli.command()
@click.argument("url", required=False)
@click.option("--get", "get_content", is_flag=True, help="Output cached content")
@click.option("--list", "list_cache", is_flag=True, help="List all cached URLs")
@click.option("--json", "output_json", is_flag=True, help="Output as JSON")
def cache(url, get_content, list_cache, output_json):
    if list_cache:
        entries = cache_list()
        if output_json:
            click.echo(json_lib.dumps(entries, indent=2))
        else:
            click.echo(f"Entries: {len(entries)}\n")
            for e in entries[-20:]:
                mode = f" [{e.get('fetch_mode', '')}]" if e.get('fetch_mode') else ""
                click.echo(f"  {e['url'][:80]}  {e.get('fetched_at', '?')[:16]}{mode}")
        return

    if not url:
        raise click.UsageError("URL required (or use --list)")

    row = cache_lookup(url, respect_ttl=False)
    if not row:
        click.echo(f"Not cached: {url}", err=True)
        raise SystemExit(1)

    if get_content:
        if row.get("content_type") == "redirect":
            click.echo(f"Redirect to: {row.get('wayback_url', '?')}", err=True)
        else:
            click.echo(row.get("raw_html", ""))
        return

    # Show cache info
    if output_json:
        click.echo(json_lib.dumps(row, indent=2, default=str))
    else:
        click.echo(f"url: {url}")
        click.echo(f"domain: {row.get('domain', '?')}")
        status = row.get("status_code", 200)
        if status and status != 200:
            click.echo(f"status_code: {status}")
        click.echo(f"fetched_at: {row.get('fetched_at', '?')}")
        if row.get("fetch_mode"):
            click.echo(f"fetch_mode: {row['fetch_mode']}")
        if row.get("content_type"):
            click.echo(f"content_type: {row['content_type']}")
```

**Step 2: Remove `fetcher_cache` imports from cli.py**

Remove: `from lib.ingest.fetcher_cache import url_to_cache_path, load_cache_metadata, DEFAULT_CACHE_DIR`

**Step 3: Commit**

```bash
git add lib/ingest/cli.py
git commit -m "refactor(ingest): CLI cache command uses content_store"
```

---

### Task 5: Strip flat-file I/O from `fetcher_cache.py`

**Files:**
- Modify: `lib/ingest/fetcher_cache.py`

**Step 1: Remove all flat-file functions**

Delete these functions (no longer have callers):
- `url_to_cache_path`
- `save_cache_metadata`
- `load_cache_metadata`
- `update_cache_metadata`
- `is_cache_stale`
- `save_redirect`
- `get_cached_redirect`
- `save_response` (if still present)
- `DEFAULT_CACHE_DIR` constant

Keep:
- `is_cacheable_status`, `get_ttl_for_status`
- `is_permanent_redirect`, `is_temporary_redirect`
- `CACHEABLE_*` constants, `TTL_*` constants

**Step 2: Rename file to reflect new scope**

Rename `fetcher_cache.py` -> `cache_policy.py` since it now only contains cache *policy* (TTL, status classification, simhash) with no I/O.

Update all imports:
- `lib/ingest/fetcher.py`: `from .cache_policy import ...`
- `lib/ingest/content_store.py`: `from lib.ingest.cache_policy import get_ttl_for_status`
- `learning/gyms/fetchability/fetchability_tool.py`: `from lib.ingest.cache_policy import ...`

**Step 3: Verify no remaining imports of `fetcher_cache`**

Run: `grep -r "fetcher_cache" --include="*.py" .`
Expected: zero results.

**Step 4: Commit**

```bash
git add lib/ingest/cache_policy.py lib/ingest/fetcher.py lib/ingest/content_store.py learning/gyms/fetchability/fetchability_tool.py
git rm lib/ingest/fetcher_cache.py
git commit -m "refactor(ingest): rename fetcher_cache -> cache_policy, remove flat-file I/O"
```

---

### Task 6: Delete flat file cache and migration script

**Files:**
- Delete: `lib/cache/` (entire directory)
- Delete: `lib/ingest/migrate_cache.py`
- Delete: `lib/ingest/http_cache.py` (if browser handlers were updated to import content_store directly; otherwise keep the thin wrapper)

**Step 1: Verify flat files are in content.db**

Run: `cd /Users/tchklovski/all-code/rivus && mamba run -n rivu python -c "from lib.ingest.content_store import stats; print(stats())"`
Expected: count >= 422 (the number of flat files).

If count is lower, run the migration first:
`mamba run -n rivu python -m lib.ingest.migrate_cache --commit`

**Step 2: Delete flat file cache**

```bash
trash lib/cache/
trash lib/ingest/migrate_cache.py
```

**Step 3: Remove `lib/cache` from `.gitignore` if listed**

Check and clean up any references.

**Step 4: Update CLAUDE.md**

In `lib/ingest/CLAUDE.md`, update the "CACHE LOCATION" reference from `lib/cache/` to `lib/ingest/data/content.db`.

**Step 5: Commit**

```bash
git add -A
git commit -m "chore(ingest): remove flat file cache and migration script"
```

---

### Task 7: Install `zstandard` in rivu env

**Files:** None (environment fix)

The content_store requires `zstandard` which is missing from the rivu env (discovered during investigation).

**Step 1: Install**

```bash
mamba install -n rivu zstandard
```

**Step 2: Verify import**

```bash
mamba run -n rivu python -c "import zstandard; print(zstandard.__version__)"
```

**Step 3: Add to requirements if there's a requirements file**

Check `lib/content_store/requirements.txt` or `requirements.txt` and add `zstandard`.

Note: This task should be done FIRST (before Task 1) since tests depend on it.

---

## Execution Order

1. **Task 7** — Install zstandard (prerequisite)
2. **Task 1** — Extend content_store with cache ops + tests
3. **Task 2** — Rewire fetcher.py (biggest change)
4. **Task 3** — Rewire remaining consumers
5. **Task 4** — Update CLI
6. **Task 5** — Strip fetcher_cache.py -> cache_policy.py
7. **Task 6** — Delete flat files

## Risks

- **`fetch()` return signature change** — `cache_path` is removed from the return tuple. All callers must be updated. Grep for `fetch_httpx_cached` and `fetch(` to find them.
- **`vario/history.py`** uses `CACHE_DIR / "history.yaml"` — needs a new home for the history file.
- **Redirect storage** — We repurpose `wayback_url` field for redirect targets rather than adding a new column. This is pragmatic but slightly overloaded. Fine for now.
- **`zstandard` not installed** — Must install before any content_store code runs.