VIC Alpha Prediction — W1 Evaluation Results

2026-03-01  |  Pipeline: finance/vic_analysis/predict_robust.py  |  Plan: docs/plans/2026-03-01-vic-alpha-v2.md

TL;DR: VIC thesis text predicts 30-day stock alpha (Spearman 0.17, p<0.01 permutation test, Q5 win rate 69%). The 365-day signal we reported earlier was inflated by target leakage — with proper embargo it vanishes on current data. Scaling from 875 tech ideas to the full 25K VIC corpus (2000–2025) is the critical next step.
0.170
Spearman r (30d, embargoed)
p=0.000
Permutation test (100 shuffles)
69%
Q5 win rate (30d)
0.067
Spearman r (365d, embargoed)

What is the Embargo Gap?

The embargo gap prevents target leakage — when training data "sees" information from the test period through overlapping return windows.

Without embargo (LEAKY): Training idea posted 2023-06-01 ────── 365d return window ──────▶ 2024-06-01 Test idea posted 2024-01-01 ────── 365d return window ──────▶ 2025-01-01 ▲▲▲▲▲▲ OVERLAP ▲▲▲▲▲▲ Jan–Jun 2024: both windows share the same market moves With 365-day embargo (CLEAN): Training idea posted 2023-01-01 ── 365d return ──▶ 2024-01-01 ◄── embargo gap ──► Test idea posted 2024-01-01 ────── 365d return window ──────▶ 2025-01-01 No overlap. Training returns fully resolved before test period begins. Embargo rule: Training idea's posted_at + horizon_days < first test idea's posted_at

For a 30-day target, the embargo is just 30 days — barely affects training size.
For a 365-day target, the embargo pushes the training cutoff back a full year — devastating with only 3 years of data.

Why Only 3 Years of Data?

The full VIC database has 25,736 ideas from 2000–2025. We only computed returns for 1,000 tech-sector ideas from 2022–2025 as a proof-of-concept. This was a design choice for the initial experiment, not a fundamental limitation.
DatasetIdeasYearsSectorsStatus
Current sample8752022–2025 (3 yrs)Tech onlyDone
With embeddings5872022–2025Tech onlyDone
Full VIC DB25,7362000–2025 (25 yrs)All sectorsW2: next step
With descriptions~18,2552000–2025All sectorsNeed embed

With the full corpus, a 365-day embargo would still leave 15+ years of training data per fold. The 365-day signal may well be real — we just can't test it honestly with 3 years.

Can We Use All Ideas That Have 30d Returns?

Yes — and we already are. The pipeline drops NaN targets per-horizon, so --horizon 30d uses all 853 ideas with 30d data (vs 753 for 365d). The gain isn't in idea count (853 vs 753 is small), it's in embargo impact:

HorizonEmbargoIdeas with targetEmbargoed training (2024 fold)Folds available
30d30 days8533712 (2023, 2024)
90d90 days8413202
365d365 days7531221 (only 2024)
730d730 days33800 (impossible)

The real bottleneck isn't idea count — it's date range. With 2022–2025 data, a 365d embargo eats an entire year of training. With 2000–2025, it barely matters.

W1 Results: Signal by Horizon

HorizonEmbargoFoldsSpearman rp-valueQ5-Q1 medianQ5 win%Signal?
30d30d2+0.1700.000*+4.0%69% REAL
90d90d2+0.0650.069+13.5%67% Marginal
365d365d1+0.0670.429+6.9%41% Untestable
365dnone2+0.1990.002+39.1%55% Leaked

*Permutation test p-value (100 stratified shuffles). Other p-values are Spearman rank correlation.

30d Quintile Detail (2024 Fold)

QuintilenMean alphaMedian alphaWin rate
Q1 (worst predicted)36-4.4%-2.4%36%
Q236-5.4%-8.6%33%
Q335+2.5%+1.6%54%
Q436+1.5%+0.6%53%
Q5 (best predicted)36+6.4%+4.5%69%

Monotonic quintile pattern: model's top picks (Q5) earn +4.5% median alpha in 30 days at a 69% win rate, while bottom picks (Q1) earn -2.4% at 36%.

Permutation Test

Stratified permutation test (30d alpha, 100 shuffles) Shuffle y within (year × thesis_type) strata, run full pipeline Observed mean Spearman: +0.170 Null distribution mean: +0.024 (std: 0.043) 95th percentile (null): +0.094 99th percentile (null): +0.107 z-score: 3.4σ above null p-value: 0.000 (0/100 shuffles ≥ observed) Verdict: Signal is genuine. Not an artifact of temporal structure or sector clustering.

How We Got Here: V1 → V2 → W1

Stage365d Spearman30d SpearmanWhat changed
V1 (single split)0.3570.242Initial result — inflated
V2 (walk-forward, no embargo)0.1990.163Walk-forward CV, historical mcap, bootstrap
W1 (with embargo)0.0670.170Embargo gap, block bootstrap, permutation test
Lesson: V1's 365d Spearman of 0.357 was inflated by three things: data snooping (testing 10+ models on the same 137-sample test set), fundamentals look-ahead bias (using 2025 market cap to predict 2022 returns), and target leakage (overlapping return windows). Each fix independently reduced the number. The honest 365d signal is untestable with current data — not disproven, just insufficient.

Current Pipeline

Input: 587 tech-sector VIC ideas with thesis text + embeddings (2022–2025) Features: Embeddings: 1536-dim (text-embedding-3-small) → PCA to 100 dims (train-only fit) Scalar (18): quality_score, is_long, posting_year, posting_month, desc_len, cat_len, has_catalysts, is_contrarian, time_horizon (short/long), thesis_type dummies Model: RidgeCV (α auto-selected via internal LOO, always picks α=1000) Evaluation: Walk-forward CV: train on [min_year, T-embargo), test on [T, T+1) Embargo gap: auto-matched to target horizon (30d → 30 day gap) Bootstrap: quarterly block bootstrap (not IID) Significance: stratified permutation test (year × sector) Metrics: Spearman r, Q5-Q1 median spread, quintile win rates Key files: Pipeline: finance/vic_analysis/predict_robust.py Experiments: finance/vic_analysis/LOGBOOK.md Plan: docs/plans/2026-03-01-vic-alpha-v2.md

Next Steps — Prioritized

  1. W2: Scale to full 25K VIC corpus Critical blocker
    Extract all 25,736 ideas from SanDisk VIC DB. Compute returns for all sectors (not just tech). Embed all thesis texts (~$4 at text-embedding-3-small rates). This gives 20+ years of walk-forward folds and unlocks honest 365d evaluation.
    Effort: Medium. Requires SanDisk mounted. ~30 min Finnhub API for returns, ~$4 OpenAI for embeddings.
  2. Re-evaluate all horizons with 25K ideas Depends on W2
    With 2000–2025 data, a 365d embargo leaves ~15 years of training per fold. Run walk-forward CV with embargo at 30d, 90d, 365d. If 365d signal reappears, it's real. If not, 30d is the signal. Also test 7d (fastest actionable horizon).
  3. W4: Author track record features Low effort, high value
    Bayesian-shrunk hit rate per author, with horizon embargo (only count prior ideas whose return window has fully elapsed). Implementation code is in the plan doc. Requires W2 data for meaningful author sample sizes.
  4. W3: Survivorship bias fix Data integrity
    Delisted companies are missing from returns. SHORTs on bankruptcies (+100% win) and LONGs on failures (-100% loss) are both excluded. Cross-reference VIC symbols against delisting databases.
  5. W5: Direction-aware evaluation Quick win
    Single model with is_long feature, but report Spearman and quintiles separately for LONG and SHORT subsets. SHORT signal may be stronger (VIC members are better at finding overvalued stocks).
  6. W7: Portfolio-level backtest Proves tradability
    Monthly formation of top-decile LONG + bottom-decile SHORT portfolio. Compute Sharpe, drawdown, turnover with realistic constraints (slippage, borrow costs, position limits).
  7. W6: Discovery timing Live pipeline prep
    Measure empirical lag cost: compute alpha at T+0, T+1, T+3, T+7 and check if rankings change. Connect to moneygun event log (nvme/paper/o/) for real discovery timestamps.

Execution Order

Now Then After ┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ W2: Scale data │──────────▶│ Re-eval all │────────▶│ W7: Portfolio │ │ 25K ideas │ │ horizons w/ │ │ backtest │ │ all sectors │ │ embargo │ │ (proves tradable)│ │ ~$4 embed cost │ │ (365d testable!) │ └──────────────────┘ └─────────────────┘ └──────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌──────────────────┐ │ W4: Author │ │ W5: Direction │ │ track record │ │ split eval │ │ (needs data) │ │ (LONG vs SHORT) │ └─────────────────┘ └──────────────────┘ │ ▼ ┌─────────────────┐ │ W3: Survivorship│ │ bias fix │ │ (delisted cos) │ └─────────────────┘
CLI Reference
# Standard run (30d with embargo — the honest evaluation)
python -m finance.vic_analysis.predict_robust --no-fundamentals --horizon 30d

# Compare with/without embargo
python -m finance.vic_analysis.predict_robust --no-fundamentals --horizon 365d           # with embargo (honest)
python -m finance.vic_analysis.predict_robust --no-fundamentals --horizon 365d --no-embargo  # without (inflated)

# Nested PCA selection (slower, auto-picks dims)
python -m finance.vic_analysis.predict_robust --no-fundamentals --horizon 30d --auto-pca

# Full permutation test (~5 min, 100 shuffles)
python -m finance.vic_analysis.predict_robust --no-fundamentals --horizon 30d --permutation-test

# Single split mode (for quick V1-style comparison)
python -m finance.vic_analysis.predict_robust --no-fundamentals --single-split
Glossary
Spearman r
Rank correlation between predicted and actual alpha. Measures whether the model correctly orders ideas from worst to best, regardless of the exact predicted values. Range: -1 to +1.
Embargo gap
Time buffer between the last training idea's return window and the first test idea. Prevents target leakage from overlapping price observation periods.
Walk-forward CV
Cross-validation that respects time order: train on past data, test on future data. Expanding window: each fold adds more past data. Never trains on future data.
Q5-Q1 spread
Difference in actual returns between the model's top quintile (Q5, best predicted) and bottom quintile (Q1, worst predicted). Measures economic significance.
Block bootstrap
Statistical resampling that preserves temporal structure by resampling whole time blocks (quarters) rather than individual observations.
Permutation test
Significance test that shuffles the target variable within strata (year × sector) and re-runs the full pipeline. If the real result exceeds 95% of shuffled results, the signal is real.
PCA
Principal Component Analysis — reduces 1536-dimensional embeddings to ~100 meaningful dimensions. Fit on training data only to prevent leakage.
RidgeCV
Linear regression with L2 regularization. The "CV" means it automatically selects the regularization strength via internal leave-one-out cross-validation.
Alpha
Excess return vs benchmark (SPY). Alpha = stock return − beta × SPY return. Positive alpha means the stock beat the market after adjusting for systematic risk.

Generated 2026-03-01  |  VIC Alpha V2 Project  |  finance/vic_analysis/