# Finance Shared Returns Library — Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Create `finance/lib/` with shared return computation (multiple alpha methods) and statistical testing, then migrate `ceo_quality/returns.py` and `vic_analysis/returns.py` to use it.

**Architecture:** `finance/lib/returns.py` provides a single async `compute_returns()` that takes (symbol, as_of, horizons, benchmark_type) and returns a rich dict with absolute, excess, beta, and alpha returns. `finance/lib/stats.py` provides multiple testing correction (Bonferroni, BH) and bootstrap CIs. Both `vic_analysis/returns.py` and `ceo_quality/returns.py` become thin wrappers calling the shared core. Price infrastructure stays in `vic_analysis/prices/` for now (Phase 2 would move it to `finance/lib/prices/`).

**Tech Stack:** pandas, numpy, scipy.stats, asyncio. Reuses `finance/vic_analysis/prices/` (daily.py, calendar.py, cache.py, beta.py).

---

## Task 1: Scaffold `finance/lib/`

**Files:**
- Create: `finance/lib/__init__.py`
- Create: `finance/lib/tests/__init__.py`

**Step 1: Create package files**

```python
# finance/lib/__init__.py
"""Finance shared library — returns, alpha computation, statistical testing."""
```

```python
# finance/lib/tests/__init__.py
"""Tests for finance.lib."""
```

**Step 2: Verify imports**

Run: `python -c "import finance.lib"`
Expected: No error

**Step 3: Commit**

```bash
git add finance/lib/__init__.py finance/lib/tests/__init__.py
git commit -m "scaffold: finance/lib/ shared library"
```

---

## Task 2: Create `finance/lib/returns.py` — core return computation

**Files:**
- Create: `finance/lib/returns.py`
- Create: `finance/lib/tests/test_returns.py`

**Context:**
- `vic_analysis/returns.py:get_returns()` (lines 118-221) computes: absolute, benchmark, excess, beta, alpha per horizon
- `ceo_quality/returns.py:add_returns()` (lines 32-116) computes: only excess, no beta/alpha
- Both use `fetch_daily_candles()` from `finance/vic_analysis/prices/daily.py`
- Both use `next_trading_day()`, `trading_day_offset()` from `finance/vic_analysis/prices/calendar.py`
- Beta from `finance/vic_analysis/prices/beta.py:compute_trailing_beta()`

**Step 1: Write tests**

```python
# finance/lib/tests/test_returns.py
"""Tests for finance.lib.returns — core return computation."""

import numpy as np
import pandas as pd
import pytest
from unittest.mock import AsyncMock, patch

from finance.lib.returns import (
    BenchmarkType,
    compute_returns,
    compute_returns_batch,
    ReturnMetrics,
)


def _make_candles(dates, prices):
    """Helper: build candle DataFrame from date/price lists."""
    return pd.DataFrame({
        "date": dates,
        "open": prices,
        "high": [p * 1.01 for p in prices],
        "low": [p * 0.99 for p in prices],
        "close": prices,
        "volume": [1000] * len(dates),
    })


@pytest.fixture
def mock_candles():
    """Stock goes from 100 to 110 (+10%), SPY goes from 400 to 420 (+5%)."""
    stock = _make_candles(
        ["2024-01-16", "2024-01-17", "2024-01-18", "2024-06-28"],
        [100.0, 101.0, 102.0, 110.0],
    )
    spy = _make_candles(
        ["2024-01-16", "2024-01-17", "2024-01-18", "2024-06-28"],
        [400.0, 402.0, 404.0, 420.0],
    )
    return stock, spy


class TestBenchmarkType:
    def test_market_is_default(self):
        assert BenchmarkType.MARKET.value == "market"

    def test_all_types_exist(self):
        assert hasattr(BenchmarkType, "MARKET")
        assert hasattr(BenchmarkType, "SECTOR")
        assert hasattr(BenchmarkType, "FACTOR")


class TestComputeReturns:
    @pytest.mark.asyncio
    async def test_basic_excess_return(self, mock_candles):
        stock, spy = mock_candles

        with patch("finance.lib.returns.fetch_daily_candles", new_callable=AsyncMock) as mock_fetch, \
             patch("finance.lib.returns.next_trading_day", return_value="2024-01-16"), \
             patch("finance.lib.returns.trading_day_offset", return_value="2024-06-28"), \
             patch("finance.lib.returns.compute_trailing_beta", new_callable=AsyncMock, return_value=1.0):
            mock_fetch.side_effect = lambda sym, *a, **kw: stock if sym == "AAPL" else spy

            result = await compute_returns("AAPL", "2024-01-15", horizons=[120])

            assert result["symbol"] == "AAPL"
            data = result["horizons"][120]
            assert data is not None
            # Stock: 110/100 - 1 = 10%, SPY: 420/400 - 1 = 5%
            assert abs(data.absolute - 10.0) < 0.01
            assert abs(data.benchmark - 5.0) < 0.01
            assert abs(data.excess - 5.0) < 0.01

    @pytest.mark.asyncio
    async def test_no_data_returns_error(self):
        empty_df = pd.DataFrame(columns=["date", "open", "high", "low", "close", "volume"])

        with patch("finance.lib.returns.fetch_daily_candles", new_callable=AsyncMock, return_value=empty_df), \
             patch("finance.lib.returns.next_trading_day", return_value="2024-01-16"), \
             patch("finance.lib.returns.trading_day_offset", return_value="2024-06-28"):

            result = await compute_returns("FAKE", "2024-01-15", horizons=[120])
            assert "error" in result

    @pytest.mark.asyncio
    async def test_alpha_with_beta(self, mock_candles):
        stock, spy = mock_candles

        with patch("finance.lib.returns.fetch_daily_candles", new_callable=AsyncMock) as mock_fetch, \
             patch("finance.lib.returns.next_trading_day", return_value="2024-01-16"), \
             patch("finance.lib.returns.trading_day_offset", return_value="2024-06-28"), \
             patch("finance.lib.returns.compute_trailing_beta", new_callable=AsyncMock, return_value=1.5):
            mock_fetch.side_effect = lambda sym, *a, **kw: stock if sym == "AAPL" else spy

            result = await compute_returns("AAPL", "2024-01-15", horizons=[120])
            data = result["horizons"][120]
            # alpha = abs - beta * bench = 10 - 1.5 * 5 = 2.5
            assert data.alpha is not None
            assert abs(data.alpha - 2.5) < 0.01


class TestComputeReturnsBatch:
    @pytest.mark.asyncio
    async def test_adds_columns_to_df(self, mock_candles):
        stock, spy = mock_candles
        df = pd.DataFrame({
            "symbol": ["AAPL"],
            "as_of_date": ["2024-01-15"],
        })

        with patch("finance.lib.returns.fetch_daily_candles", new_callable=AsyncMock) as mock_fetch, \
             patch("finance.lib.returns.next_trading_day", return_value="2024-01-16"), \
             patch("finance.lib.returns.trading_day_offset", return_value="2024-06-28"), \
             patch("finance.lib.returns.compute_trailing_beta", new_callable=AsyncMock, return_value=1.0):
            mock_fetch.side_effect = lambda sym, *a, **kw: stock if sym == "AAPL" else spy

            result = await compute_returns_batch(df, horizons=[120])
            assert "ret_120d_excess" in result.columns
            assert "ret_120d_abs" in result.columns
            assert "ret_120d_alpha" in result.columns
```

**Step 2: Run tests — verify they fail**

Run: `pytest finance/lib/tests/test_returns.py -v`
Expected: ImportError (finance.lib.returns doesn't exist yet)

**Step 3: Implement `finance/lib/returns.py`**

```python
#!/usr/bin/env python
"""Core return computation — shared across all finance strategies.

Computes absolute, benchmark, excess, beta, and alpha returns for any
(symbol, as_of_date, horizons) tuple. Pluggable benchmark types.

Consumers:
    - finance/ceo_quality/returns.py (thin wrapper)
    - finance/vic_analysis/returns.py (thin wrapper)
    - Any future strategy module

Price infrastructure lives in finance/vic_analysis/prices/ for now.
"""

import asyncio
from dataclasses import dataclass
from enum import Enum

import pandas as pd
from loguru import logger

from finance.vic_analysis.prices.daily import fetch_daily_candles
from finance.vic_analysis.prices.calendar import next_trading_day, trading_day_offset
from finance.vic_analysis.prices.beta import compute_trailing_beta


class BenchmarkType(Enum):
    """Benchmark method for computing abnormal returns."""
    MARKET = "market"       # SPY (default)
    SECTOR = "sector"       # Sector ETF (XLK, XLF, etc.)
    FACTOR = "factor"       # Fama-French 3-factor (future)


# Sector → ETF mapping (GICS sectors from Finnhub gsector field)
SECTOR_ETFS = {
    "Technology": "XLK",
    "Financial Services": "XLF",
    "Healthcare": "XLV",
    "Consumer Cyclical": "XLY",
    "Consumer Defensive": "XLP",
    "Industrials": "XLI",
    "Energy": "XLE",
    "Utilities": "XLU",
    "Real Estate": "XLRE",
    "Basic Materials": "XLB",
    "Communication Services": "XLC",
}


@dataclass
class ReturnMetrics:
    """Return metrics for a single horizon."""
    absolute: float             # Raw stock return (%)
    benchmark: float | None     # Benchmark return (%)
    excess: float | None        # absolute - benchmark
    beta: float | None          # Trailing 252d beta
    alpha: float | None         # absolute - beta * benchmark (Jensen's alpha)


def _close_on_date(df: pd.DataFrame, date: str) -> float | None:
    """Get close price on exact date, or None if missing."""
    row = df[df["date"] == date]
    if row.empty:
        return None
    return float(row.iloc[0]["close"])


async def compute_returns(
    symbol: str,
    as_of: str,
    horizons: list[int] | None = None,
    benchmark: str = "SPY",
    benchmark_type: BenchmarkType = BenchmarkType.MARKET,
    sector: str | None = None,
    cache=None,
    compute_beta: bool = True,
) -> dict:
    """Compute returns at specified horizons from as_of date.

    Args:
        symbol: Stock ticker.
        as_of: Date string (YYYY-MM-DD). Snapped to next trading day.
        horizons: List of trading day horizons (default [90, 180, 365]).
        benchmark: Benchmark ticker (default SPY). Overridden by benchmark_type.
        benchmark_type: How to compute abnormal returns.
        sector: GICS sector name (required for SECTOR benchmark_type).
        cache: Optional diskcache instance (shared Finnhub cache).
        compute_beta: Whether to compute trailing beta (default True).

    Returns:
        Dict with keys:
            symbol: str
            as_of: str (snapped trading day)
            horizons: dict[int, ReturnMetrics | None]
            error: str (only if failed)
    """
    if horizons is None:
        horizons = [90, 180, 365]

    # Resolve benchmark ticker from type
    bench_ticker = benchmark
    if benchmark_type == BenchmarkType.SECTOR and sector:
        bench_ticker = SECTOR_ETFS.get(sector, benchmark)
        if bench_ticker == benchmark and sector not in SECTOR_ETFS:
            logger.debug(f"Unknown sector '{sector}' — falling back to {benchmark}")

    # Snap as_of to next trading day
    as_of_trading = next_trading_day(as_of)

    # Determine date range needed
    max_horizon = max(horizons)
    try:
        end_date = trading_day_offset(as_of_trading, max_horizon)
    except ValueError:
        end_date = str((pd.Timestamp(as_of) + pd.Timedelta(days=int(max_horizon * 1.5) + 10)).date())

    # Beta lookback
    if compute_beta:
        try:
            beta_start = trading_day_offset(as_of_trading, -252)
        except ValueError:
            beta_start = str((pd.Timestamp(as_of) - pd.Timedelta(days=400)).date())
        fetch_start = f"{pd.Timestamp(beta_start).year}-01-01"
    else:
        fetch_start = str((pd.Timestamp(as_of) - pd.Timedelta(days=10)).date())

    # Year-align end for cache efficiency
    fetch_end_ts = pd.Timestamp(end_date)
    fetch_end = f"{fetch_end_ts.year + 1}-01-01" if fetch_end_ts.month > 1 else end_date

    # Fetch price data
    stock_df = await fetch_daily_candles(symbol, fetch_start, fetch_end, cache=cache)
    bench_df = await fetch_daily_candles(bench_ticker, fetch_start, fetch_end, cache=cache)

    if stock_df.empty:
        logger.warning(f"No price data for {symbol}")
        return {"symbol": symbol, "as_of": as_of_trading, "error": "no_data"}

    # Starting prices
    close_start = _close_on_date(stock_df, as_of_trading)
    bench_start = _close_on_date(bench_df, as_of_trading)

    if close_start is None:
        logger.warning(f"No close price for {symbol} on {as_of_trading}")
        return {"symbol": symbol, "as_of": as_of_trading, "error": "no_start_price"}

    # Trailing beta
    beta = None
    if compute_beta:
        beta = await compute_trailing_beta(symbol, as_of_trading, benchmark=bench_ticker, cache=cache)

    result = {"symbol": symbol, "as_of": as_of_trading, "horizons": {}}

    for h in horizons:
        try:
            target_date = trading_day_offset(as_of_trading, h)
        except ValueError:
            result["horizons"][h] = None
            continue

        close_end = _close_on_date(stock_df, target_date)
        bench_end = _close_on_date(bench_df, target_date) if bench_start else None

        if close_end is None:
            result["horizons"][h] = None
            continue

        abs_ret = (close_end / close_start - 1) * 100
        bench_ret = ((bench_end / bench_start - 1) * 100) if bench_start and bench_end else None
        excess = (abs_ret - bench_ret) if bench_ret is not None else None
        alpha = (abs_ret - beta * bench_ret) if (beta is not None and bench_ret is not None) else None

        result["horizons"][h] = ReturnMetrics(
            absolute=round(abs_ret, 2),
            benchmark=round(bench_ret, 2) if bench_ret is not None else None,
            excess=round(excess, 2) if excess is not None else None,
            beta=beta,
            alpha=round(alpha, 2) if alpha is not None else None,
        )

    return result


async def compute_returns_batch(
    df: pd.DataFrame,
    horizons: list[int] | None = None,
    benchmark: str = "SPY",
    benchmark_type: BenchmarkType = BenchmarkType.MARKET,
    cache=None,
    compute_beta: bool = True,
) -> pd.DataFrame:
    """Add return columns to a DataFrame with (symbol, as_of_date) columns.

    Adds columns: ret_{h}d_abs, ret_{h}d_excess, ret_{h}d_alpha for each horizon.
    Also adds ret_{h}d_beta (same across horizons, from trailing regression).

    This is the batch API used by ceo_quality and other strategy modules.
    """
    if horizons is None:
        horizons = [90, 180, 365]

    today = pd.Timestamp.now().normalize()
    df = df.copy()

    for h in horizons:
        abs_col = f"ret_{h}d_abs"
        excess_col = f"ret_{h}d_excess"
        alpha_col = f"ret_{h}d_alpha"
        beta_col = f"ret_{h}d_beta"

        abs_vals, excess_vals, alpha_vals, beta_vals = [], [], [], []
        skipped = []

        for _, row in df.iterrows():
            symbol = row["symbol"]
            as_of = row["as_of_date"]
            sector = row.get("sector")

            # Check horizon doesn't extend past today
            try:
                start = next_trading_day(as_of)
                end = trading_day_offset(start, h)
                if pd.Timestamp(end) > today:
                    abs_vals.append(None)
                    excess_vals.append(None)
                    alpha_vals.append(None)
                    beta_vals.append(None)
                    skipped.append((symbol, f"horizon extends past today"))
                    continue
            except ValueError:
                pass

            ret = await compute_returns(
                symbol, as_of, horizons=[h],
                benchmark=benchmark,
                benchmark_type=benchmark_type,
                sector=sector,
                cache=cache,
                compute_beta=compute_beta,
            )

            if "error" in ret:
                abs_vals.append(None)
                excess_vals.append(None)
                alpha_vals.append(None)
                beta_vals.append(None)
                skipped.append((symbol, ret["error"]))
                continue

            metrics = ret["horizons"].get(h)
            if metrics is None:
                abs_vals.append(None)
                excess_vals.append(None)
                alpha_vals.append(None)
                beta_vals.append(None)
                skipped.append((symbol, "no data for horizon"))
            else:
                abs_vals.append(metrics.absolute)
                excess_vals.append(metrics.excess)
                alpha_vals.append(metrics.alpha)
                beta_vals.append(metrics.beta)

        df[abs_col] = abs_vals
        df[excess_col] = excess_vals
        df[alpha_col] = alpha_vals
        df[beta_col] = beta_vals

        n_valid = sum(1 for v in excess_vals if v is not None)
        n_total = len(df)
        n_skipped = n_total - n_valid

        logger.info(f"Returns {h}d: {n_valid}/{n_total} valid, {n_skipped} skipped")

        if skipped:
            logger.warning(f"Returns {h}d — {len(skipped)} skipped:")
            for sym, reason in skipped:
                logger.warning(f"  {sym}: {reason}")

        if n_skipped > n_total * 0.3:
            logger.warning(
                f"⚠️  {h}d: {n_skipped}/{n_total} ({n_skipped/n_total:.0%}) companies "
                f"have no returns — check horizon vs as_of_date, symbol validity"
            )

    return df
```

**Step 4: Run tests — verify they pass**

Run: `pytest finance/lib/tests/test_returns.py -v`
Expected: All 6 tests pass

**Step 5: Commit**

```bash
git add finance/lib/returns.py finance/lib/tests/test_returns.py
git commit -m "feat(finance/lib): core return computation with pluggable benchmarks"
```

---

## Task 3: Create `finance/lib/stats.py` — statistical testing

**Files:**
- Create: `finance/lib/stats.py`
- Create: `finance/lib/tests/test_stats.py`

**Step 1: Write tests**

```python
# finance/lib/tests/test_stats.py
"""Tests for finance.lib.stats — multiple testing correction and bootstrap CIs."""

import numpy as np
import pytest

from finance.lib.stats import (
    bonferroni_correction,
    benjamini_hochberg,
    bootstrap_ci,
    permutation_test_auc,
)


class TestBonferroni:
    def test_single_pvalue(self):
        result = bonferroni_correction([0.03], alpha=0.05)
        assert result[0]["reject"]  # 0.03 < 0.05/1

    def test_corrects_threshold(self):
        pvals = [0.01, 0.03, 0.04]
        result = bonferroni_correction(pvals, alpha=0.05)
        # threshold = 0.05/3 ≈ 0.0167
        assert result[0]["reject"]     # 0.01 < 0.0167
        assert not result[1]["reject"]  # 0.03 > 0.0167
        assert not result[2]["reject"]  # 0.04 > 0.0167

    def test_preserves_labels(self):
        result = bonferroni_correction(
            [0.01, 0.10],
            labels=["feat_A", "feat_B"],
        )
        assert result[0]["label"] == "feat_A"
        assert result[1]["label"] == "feat_B"


class TestBenjaminiHochberg:
    def test_controls_fdr(self):
        pvals = [0.001, 0.01, 0.04, 0.05, 0.10]
        result = benjamini_hochberg(pvals, alpha=0.05)
        # BH is less conservative than Bonferroni
        assert result[0]["reject"]  # smallest p-value
        assert result[1]["reject"]  # 0.01 < 0.05 * 2/5 = 0.02
        rejected = sum(1 for r in result if r["reject"])
        assert rejected >= 2

    def test_all_significant(self):
        pvals = [0.001, 0.002, 0.003]
        result = benjamini_hochberg(pvals, alpha=0.05)
        assert all(r["reject"] for r in result)

    def test_none_significant(self):
        pvals = [0.5, 0.6, 0.7]
        result = benjamini_hochberg(pvals, alpha=0.05)
        assert not any(r["reject"] for r in result)


class TestBootstrapCI:
    def test_returns_tuple(self):
        rng = np.random.default_rng(42)
        data = rng.normal(10, 2, size=100)
        lo, hi = bootstrap_ci(data, stat_fn=np.mean, n_boot=500, ci=0.95, seed=42)
        assert lo < 10 < hi

    def test_median(self):
        data = np.arange(100, dtype=float)
        lo, hi = bootstrap_ci(data, stat_fn=np.median, n_boot=500, ci=0.95, seed=42)
        assert lo < 50 < hi


class TestPermutationTestAUC:
    def test_random_labels_not_significant(self):
        rng = np.random.default_rng(42)
        y_true = rng.integers(0, 2, size=100)
        y_score = rng.random(100)
        p_value = permutation_test_auc(y_true, y_score, n_permutations=200, seed=42)
        assert p_value > 0.05  # random scores shouldn't be significant

    def test_perfect_prediction_significant(self):
        y_true = np.array([0]*50 + [1]*50)
        y_score = np.array([0.1]*50 + [0.9]*50)
        p_value = permutation_test_auc(y_true, y_score, n_permutations=200, seed=42)
        assert p_value < 0.05
```

**Step 2: Run tests — verify they fail**

Run: `pytest finance/lib/tests/test_stats.py -v`
Expected: ImportError

**Step 3: Implement `finance/lib/stats.py`**

```python
#!/usr/bin/env python
"""Statistical testing utilities for finance backtests.

Multiple testing correction, bootstrap confidence intervals, and
permutation tests. Used across all strategy modules.
"""

import numpy as np
from sklearn.metrics import roc_auc_score


def bonferroni_correction(
    pvalues: list[float],
    alpha: float = 0.05,
    labels: list[str] | None = None,
) -> list[dict]:
    """Bonferroni correction for multiple hypothesis tests.

    Most conservative — controls family-wise error rate (FWER).
    Rejects if p < alpha / n_tests.

    Returns list of dicts with: label, pvalue, threshold, reject.
    """
    n = len(pvalues)
    threshold = alpha / n
    labels = labels or [f"test_{i}" for i in range(n)]

    return [
        {
            "label": labels[i],
            "pvalue": pvalues[i],
            "threshold": threshold,
            "reject": pvalues[i] < threshold,
        }
        for i in range(n)
    ]


def benjamini_hochberg(
    pvalues: list[float],
    alpha: float = 0.05,
    labels: list[str] | None = None,
) -> list[dict]:
    """Benjamini-Hochberg procedure — controls false discovery rate (FDR).

    Less conservative than Bonferroni. Better when testing many hypotheses
    and some true positives are expected.

    Returns list of dicts with: label, pvalue, rank, bh_threshold, reject.
    """
    n = len(pvalues)
    labels = labels or [f"test_{i}" for i in range(n)]

    # Sort by p-value
    indexed = sorted(enumerate(pvalues), key=lambda x: x[1])

    # Find largest k where p_(k) <= (k/n) * alpha
    reject_up_to = -1
    for rank, (orig_idx, pval) in enumerate(indexed, 1):
        bh_threshold = (rank / n) * alpha
        if pval <= bh_threshold:
            reject_up_to = rank

    # Build results in original order
    results = [None] * n
    for rank, (orig_idx, pval) in enumerate(indexed, 1):
        bh_threshold = (rank / n) * alpha
        results[orig_idx] = {
            "label": labels[orig_idx],
            "pvalue": pval,
            "rank": rank,
            "bh_threshold": bh_threshold,
            "reject": rank <= reject_up_to,
        }
    return results


def bootstrap_ci(
    data: np.ndarray,
    stat_fn=np.mean,
    n_boot: int = 1000,
    ci: float = 0.95,
    seed: int | None = None,
) -> tuple[float, float]:
    """Bootstrap confidence interval for a statistic.

    Args:
        data: 1D array of observations.
        stat_fn: Function to compute statistic (default: np.mean).
        n_boot: Number of bootstrap resamples.
        ci: Confidence level (default 0.95).
        seed: Random seed for reproducibility.

    Returns:
        (lower, upper) bounds of the confidence interval.
    """
    rng = np.random.default_rng(seed)
    n = len(data)
    boot_stats = np.empty(n_boot)

    for i in range(n_boot):
        sample = rng.choice(data, size=n, replace=True)
        boot_stats[i] = stat_fn(sample)

    alpha = 1 - ci
    lo = float(np.percentile(boot_stats, 100 * alpha / 2))
    hi = float(np.percentile(boot_stats, 100 * (1 - alpha / 2)))
    return lo, hi


def permutation_test_auc(
    y_true: np.ndarray,
    y_score: np.ndarray,
    n_permutations: int = 1000,
    seed: int | None = None,
) -> float:
    """Permutation test for AUC significance.

    Shuffles labels n_permutations times and computes AUC each time.
    Returns the p-value: fraction of permuted AUCs >= observed AUC.

    Use this instead of parametric tests when sample size is small
    or distributional assumptions are questionable.
    """
    rng = np.random.default_rng(seed)
    observed_auc = roc_auc_score(y_true, y_score)

    count_ge = 0
    for _ in range(n_permutations):
        perm_y = rng.permutation(y_true)
        perm_auc = roc_auc_score(perm_y, y_score)
        if perm_auc >= observed_auc:
            count_ge += 1

    return count_ge / n_permutations
```

**Step 4: Run tests — verify they pass**

Run: `pytest finance/lib/tests/test_stats.py -v`
Expected: All 9 tests pass

**Step 5: Commit**

```bash
git add finance/lib/stats.py finance/lib/tests/test_stats.py
git commit -m "feat(finance/lib): statistical testing — Bonferroni, BH, bootstrap, permutation"
```

---

## Task 4: Migrate `ceo_quality/returns.py` to use shared library

**Files:**
- Modify: `finance/ceo_quality/returns.py` (rewrite to thin wrapper)
- Modify: `finance/ceo_quality/backtest.py` (update import if needed)
- Modify: `finance/ceo_quality/predict.py` (update import if needed)
- Modify: `finance/ceo_quality/iterate.py` (update import if needed)
- Run: `finance/ceo_quality/tests/` (all existing tests must pass)

**Step 1: Rewrite `ceo_quality/returns.py` as thin wrapper**

Replace the body of `add_returns()` to call `finance.lib.returns.compute_returns_batch()`,
but keep the same external signature so consumers don't change.

Key changes:
- Remove `fetch_daily_candles`, `next_trading_day`, `trading_day_offset` imports
- Import `compute_returns_batch` from `finance.lib.returns`
- `add_returns()` becomes a sync wrapper around `asyncio.run(compute_returns_batch(...))`
- Keep the CLI entrypoint as-is

```python
#!/usr/bin/env python
"""Forward returns — thin wrapper around finance.lib.returns.

See finance/lib/returns.py for the core computation.
"""

import asyncio
import sqlite3
import sys
from pathlib import Path

import click
import pandas as pd
from loguru import logger

_RIVUS = Path(__file__).resolve().parent.parent.parent
if str(_RIVUS) not in sys.path:
    sys.path.insert(0, str(_RIVUS))

from finance.lib.returns import compute_returns_batch, BenchmarkType

DATA_DIR = Path(__file__).parent / "data"
DB_PATH = DATA_DIR / "dataset.db"


def add_returns(df: pd.DataFrame, horizons: list[int] | None = None,
                benchmark: str = "SPY") -> pd.DataFrame:
    """Add forward return columns for each horizon (trading days) from as_of_date.

    Delegates to finance.lib.returns.compute_returns_batch().
    Adds columns: ret_{h}d_abs, ret_{h}d_excess, ret_{h}d_alpha, ret_{h}d_beta.

    For backward compatibility, the excess column name matches the old format.
    """
    if horizons is None:
        horizons = [90, 180, 365]

    return asyncio.run(compute_returns_batch(
        df, horizons=horizons, benchmark=benchmark,
    ))


def load_companies_df(db_path: Path | None = None) -> pd.DataFrame:
    """Load companies from dataset.db."""
    db_path = db_path or DB_PATH
    conn = sqlite3.connect(str(db_path))
    df = pd.read_sql("SELECT * FROM companies", conn)
    conn.close()
    return df


@click.command()
@click.option("--as-of", required=True, help="Compute returns forward from this date")
@click.option("--horizons", default="90,180,365", help="Comma-separated horizons in days")
@click.option("--benchmark", default="SPY", help="Benchmark symbol")
@click.option("--symbols", help="Comma-separated symbols (default: all in dataset)")
def main(as_of: str, horizons: str, benchmark: str, symbols: str | None):
    """Compute forward excess returns for companies in dataset."""
    horizon_list = [int(h) for h in horizons.split(",")]

    if symbols:
        symbol_list = [s.strip() for s in symbols.split(",")]
        df = pd.DataFrame({"symbol": symbol_list, "as_of_date": as_of})
    else:
        companies = load_companies_df()
        if companies.empty:
            logger.warning("No companies in dataset — run dataset.py first")
            return
        df = pd.DataFrame({
            "symbol": companies["symbol"],
            "as_of_date": as_of,
        })

    logger.info(f"Computing {horizon_list} returns for {len(df)} symbols from {as_of}")
    df = add_returns(df, horizons=horizon_list, benchmark=benchmark)

    for h in horizon_list:
        col = f"ret_{h}d_excess"
        valid = df[col].dropna()
        if len(valid) > 0:
            print(f"\n{h}d excess returns (vs {benchmark}): n={len(valid)}/{len(df)}, "
                  f"mean={valid.mean():+.1f}%, median={valid.median():+.1f}%")
            sorted_df = df.dropna(subset=[col]).sort_values(col, ascending=False)
            print(f"  Top 5:    {', '.join(f'{r.symbol} {r[col]:+.1f}%' for _, r in sorted_df.head().iterrows())}")
            print(f"  Bottom 5: {', '.join(f'{r.symbol} {r[col]:+.1f}%' for _, r in sorted_df.tail().iterrows())}")


if __name__ == "__main__":
    main()
```

**Step 2: Run existing ceo_quality tests**

Run: `pytest finance/ceo_quality/tests/ -v`
Expected: All 19 tests pass (no interface changes)

**Step 3: Verify CLI still works**

Run: `python -m finance.ceo_quality.returns --as-of 2024-01-15 --symbols AAPL,MSFT --horizons 90`
Expected: Prints excess returns (same as before)

**Step 4: Commit**

```bash
git add finance/ceo_quality/returns.py
git commit -m "refactor(ceo_quality): delegate returns to finance.lib.returns"
```

---

## Task 5: Update `vic_analysis/returns.py` to use shared core

**Files:**
- Modify: `finance/vic_analysis/returns.py` (replace `get_returns()` core with `finance.lib.returns.compute_returns()`)

**Context:** `vic_analysis/returns.py` has a lot of VIC-specific logic (batch, symbol resolution, results cache, CLI) that stays. Only the core `get_returns()` function (lines 118-221) should delegate to `finance.lib.returns.compute_returns()`.

**Step 1: Refactor `get_returns()` to wrap `compute_returns()`**

Replace lines 118-221 with a wrapper that:
1. Calls `finance.lib.returns.compute_returns()`
2. Translates the `ReturnMetrics` result back to the old dict format `{abs, bench, excess, beta, alpha}`

This preserves backward compatibility for all callers (batch_returns, CLI, predict_alpha).

```python
async def get_returns(
    symbol: str,
    as_of: str,
    horizons: list[int] | None = None,
    benchmark: str = "SPY",
    cache=None,
) -> dict:
    """Compute returns at specified horizons from as_of date.

    Delegates to finance.lib.returns.compute_returns() and translates
    result to legacy dict format for backward compatibility.
    """
    from finance.lib.returns import compute_returns

    horizons = horizons or DEFAULT_HORIZONS
    sym = normalize_symbol(symbol)

    result = await compute_returns(
        sym, as_of, horizons=horizons, benchmark=benchmark, cache=cache,
    )

    if "error" in result:
        return {"symbol_clean": result["symbol"], "as_of": result["as_of"], "error": result["error"]}

    out = {"symbol_clean": result["symbol"], "as_of": result["as_of"]}

    for h in horizons:
        label = f"{h}d"
        metrics = result["horizons"].get(h)
        if metrics is None:
            out[label] = None
        else:
            out[label] = {
                "abs": metrics.absolute,
                "bench": metrics.benchmark,
                "excess": metrics.excess,
                "beta": metrics.beta,
                "alpha": metrics.alpha,
            }

    return out
```

**Step 2: Remove now-unused imports from vic_analysis/returns.py**

Remove direct imports of: `fetch_daily_candles`, `next_trading_day`, `trading_day_offset`, `compute_trailing_beta` (they're still used transitively via finance.lib).

Keep: `open_cache`, `normalize_symbol` (used by batch_returns, CLI).

**Step 3: Run vic_analysis tests**

Run: `pytest finance/vic_analysis/prices/tests/ -v`
Expected: All existing tests pass

**Step 4: Verify CLI still works**

Run: `python -m finance.vic_analysis.returns AAPL 2024-01-15 --horizons 30,90`
Expected: Prints returns table (same as before)

**Step 5: Commit**

```bash
git add finance/vic_analysis/returns.py
git commit -m "refactor(vic_analysis): delegate get_returns to finance.lib.returns"
```

---

## Task 6: Export public API from `finance/lib/__init__.py`

**Files:**
- Modify: `finance/lib/__init__.py`

**Step 1: Add public exports**

```python
"""Finance shared library — returns, alpha computation, statistical testing."""

from finance.lib.returns import (
    BenchmarkType,
    ReturnMetrics,
    SECTOR_ETFS,
    compute_returns,
    compute_returns_batch,
)
from finance.lib.stats import (
    bonferroni_correction,
    benjamini_hochberg,
    bootstrap_ci,
    permutation_test_auc,
)

__all__ = [
    "BenchmarkType",
    "ReturnMetrics",
    "SECTOR_ETFS",
    "compute_returns",
    "compute_returns_batch",
    "bonferroni_correction",
    "benjamini_hochberg",
    "bootstrap_ci",
    "permutation_test_auc",
]
```

**Step 2: Verify import**

Run: `python -c "from finance.lib import compute_returns, bonferroni_correction; print('OK')"`
Expected: OK

**Step 3: Commit**

```bash
git add finance/lib/__init__.py
git commit -m "feat(finance/lib): export public API"
```

---

## Task 7: Update docs and architecture table

**Files:**
- Modify: `finance/ceo_quality/CLAUDE.md` (update architecture table, note shared lib)
- Modify: `finance/TODO.md` (mark finance/lib tasks as done)

**Step 1: Update CLAUDE.md architecture table**

Add `finance/lib/` entry. Update `returns.py` description to note it's a thin wrapper.

**Step 2: Mark TODO items as done**

In `finance/TODO.md`, check off:
- [x] Create `finance/lib/__init__.py`
- [x] Move core return computation to `finance/lib/returns.py` with pluggable benchmark
- [x] Add `BenchmarkType` enum: `MARKET`, `SECTOR`, `FACTOR`
- [x] Sector ETF mapping
- [x] `ceo_quality/returns.py` becomes thin wrapper
- [x] Multiple testing correction: Bonferroni, Benjamini-Hochberg (FDR)
- [x] Bootstrap confidence intervals
- [x] Permutation tests for AUC significance

**Step 3: Commit**

```bash
git add finance/ceo_quality/CLAUDE.md finance/TODO.md
git commit -m "docs: update architecture for finance/lib shared library"
```

---

## Task 8: Run full test suite

**Step 1: Run all finance tests**

Run: `pytest finance/ -v --tb=short`
Expected: All tests pass (finance/lib/tests + finance/ceo_quality/tests + finance/vic_analysis/prices/tests)

**Step 2: Verify imports work end-to-end**

```bash
python -c "
from finance.lib import compute_returns, bonferroni_correction, bootstrap_ci
from finance.ceo_quality.returns import add_returns
from finance.ceo_quality.backtest import run_binary_analysis
from finance.ceo_quality.predict import walk_forward_cv
from finance.ceo_quality.iterate import run_iteration_loop
print('All imports OK')
"
```

**Step 3: Final commit if any fixes needed**

```bash
git add -A && git commit -m "fix: address test failures from finance/lib migration"
```