VIC Alpha Prediction — W1 Evaluation Results

2026-03-01 | Pipeline: finance/vic_analysis/predict_robust.py | Plan: docs/plans/2026-03-01-vic-alpha-v2.md

TL;DR: VIC thesis text predicts 30-day stock alpha (Spearman 0.17, p<0.01 permutation test, Q5 win rate 69%). The 365-day signal we reported earlier was inflated by target leakage — with proper embargo it vanishes on current data. Scaling from 875 tech ideas to the full 25K VIC corpus (2000–2025) is the critical next step.

0.170

Spearman r (30d, embargoed)

p=0.000

Permutation test (100 shuffles)

69%

Q5 win rate (30d)

0.067

Spearman r (365d, embargoed)

What is the Embargo Gap?

The embargo gap prevents target leakage — when training data "sees" information from the test period through overlapping return windows.

Without embargo (LEAKY): Training idea posted 2023-06-01 ────── 365d return window ──────▶ 2024-06-01 Test idea posted 2024-01-01 ────── 365d return window ──────▶ 2025-01-01 ▲▲▲▲▲▲ OVERLAP ▲▲▲▲▲▲ Jan–Jun 2024: both windows share the same market moves With 365-day embargo (CLEAN): Training idea posted 2023-01-01 ── 365d return ──▶ 2024-01-01 ◄── embargo gap ──► Test idea posted 2024-01-01 ────── 365d return window ──────▶ 2025-01-01 No overlap. Training returns fully resolved before test period begins. Embargo rule: Training idea's posted_at + horizon_days < first test idea's posted_at

For a 30-day target, the embargo is just 30 days — barely affects training size.
For a 365-day target, the embargo pushes the training cutoff back a full year — devastating with only 3 years of data.

Why Only 3 Years of Data?

The full VIC database has 25,736 ideas from 2000–2025. We only computed returns for 1,000 tech-sector ideas from 2022–2025 as a proof-of-concept. This was a design choice for the initial experiment, not a fundamental limitation.

Dataset	Ideas	Years	Sectors	Status
Current sample	875	2022–2025 (3 yrs)	Tech only	Done
With embeddings	587	2022–2025	Tech only	Done
Full VIC DB	25,736	2000–2025 (25 yrs)	All sectors	W2: next step
With descriptions	~18,255	2000–2025	All sectors	Need embed

With the full corpus, a 365-day embargo would still leave 15+ years of training data per fold. The 365-day signal may well be real — we just can't test it honestly with 3 years.

Can We Use All Ideas That Have 30d Returns?

Yes — and we already are. The pipeline drops NaN targets per-horizon, so --horizon 30d uses all 853 ideas with 30d data (vs 753 for 365d). The gain isn't in idea count (853 vs 753 is small), it's in embargo impact:

Horizon	Embargo	Ideas with target	Embargoed training (2024 fold)	Folds available
30d	30 days	853	371	2 (2023, 2024)
90d	90 days	841	320	2
365d	365 days	753	122	1 (only 2024)
730d	730 days	338	0	0 (impossible)

The real bottleneck isn't idea count — it's date range. With 2022–2025 data, a 365d embargo eats an entire year of training. With 2000–2025, it barely matters.

W1 Results: Signal by Horizon

Horizon	Embargo	Folds	Spearman r	p-value	Q5-Q1 median	Q5 win%	Signal?
30d	30d	2	+0.170	0.000*	+4.0%	69%	REAL
90d	90d	2	+0.065	0.069	+13.5%	67%	Marginal
365d	365d	1	+0.067	0.429	+6.9%	41%	Untestable
365d	none	2	+0.199	0.002	+39.1%	55%	Leaked

*Permutation test p-value (100 stratified shuffles). Other p-values are Spearman rank correlation.

30d Quintile Detail (2024 Fold)

Quintile	n	Mean alpha	Median alpha	Win rate
Q1 (worst predicted)	36	-4.4%	-2.4%	36%
Q2	36	-5.4%	-8.6%	33%
Q3	35	+2.5%	+1.6%	54%
Q4	36	+1.5%	+0.6%	53%
Q5 (best predicted)	36	+6.4%	+4.5%	69%

Monotonic quintile pattern: model's top picks (Q5) earn +4.5% median alpha in 30 days at a 69% win rate, while bottom picks (Q1) earn -2.4% at 36%.

Permutation Test

Stratified permutation test (30d alpha, 100 shuffles) Shuffle y within (year × thesis_type) strata, run full pipeline Observed mean Spearman: +0.170 Null distribution mean: +0.024 (std: 0.043) 95th percentile (null): +0.094 99th percentile (null): +0.107 z-score: 3.4σ above null p-value: 0.000 (0/100 shuffles ≥ observed) Verdict: Signal is genuine. Not an artifact of temporal structure or sector clustering.

How We Got Here: V1 → V2 → W1

Stage	365d Spearman	30d Spearman	What changed
V1 (single split)	0.357	0.242	Initial result — inflated
V2 (walk-forward, no embargo)	0.199	0.163	Walk-forward CV, historical mcap, bootstrap
W1 (with embargo)	0.067	0.170	Embargo gap, block bootstrap, permutation test

Lesson: V1's 365d Spearman of 0.357 was inflated by three things: data snooping (testing 10+ models on the same 137-sample test set), fundamentals look-ahead bias (using 2025 market cap to predict 2022 returns), and target leakage (overlapping return windows). Each fix independently reduced the number. The honest 365d signal is untestable with current data — not disproven, just insufficient.

Current Pipeline

Input: 587 tech-sector VIC ideas with thesis text + embeddings (2022–2025) Features: Embeddings: 1536-dim (text-embedding-3-small) → PCA to 100 dims (train-only fit) Scalar (18): quality_score, is_long, posting_year, posting_month, desc_len, cat_len, has_catalysts, is_contrarian, time_horizon (short/long), thesis_type dummies Model: RidgeCV (α auto-selected via internal LOO, always picks α=1000) Evaluation: Walk-forward CV: train on [min_year, T-embargo), test on [T, T+1) Embargo gap: auto-matched to target horizon (30d → 30 day gap) Bootstrap: quarterly block bootstrap (not IID) Significance: stratified permutation test (year × sector) Metrics: Spearman r, Q5-Q1 median spread, quintile win rates Key files: Pipeline: finance/vic_analysis/predict_robust.py Experiments: finance/vic_analysis/LOGBOOK.md Plan: docs/plans/2026-03-01-vic-alpha-v2.md

Next Steps — Prioritized

W2: Scale to full 25K VIC corpus Critical blocker
Extract all 25,736 ideas from SanDisk VIC DB. Compute returns for all sectors (not just tech). Embed all thesis texts (~$4 at text-embedding-3-small rates). This gives 20+ years of walk-forward folds and unlocks honest 365d evaluation.
Effort: Medium. Requires SanDisk mounted. ~30 min Finnhub API for returns, ~$4 OpenAI for embeddings.
Re-evaluate all horizons with 25K ideas Depends on W2
With 2000–2025 data, a 365d embargo leaves ~15 years of training per fold. Run walk-forward CV with embargo at 30d, 90d, 365d. If 365d signal reappears, it's real. If not, 30d is the signal. Also test 7d (fastest actionable horizon).
W4: Author track record features Low effort, high value
Bayesian-shrunk hit rate per author, with horizon embargo (only count prior ideas whose return window has fully elapsed). Implementation code is in the plan doc. Requires W2 data for meaningful author sample sizes.
W3: Survivorship bias fix Data integrity
Delisted companies are missing from returns. SHORTs on bankruptcies (+100% win) and LONGs on failures (-100% loss) are both excluded. Cross-reference VIC symbols against delisting databases.
W5: Direction-aware evaluation Quick win
Single model with is_long feature, but report Spearman and quintiles separately for LONG and SHORT subsets. SHORT signal may be stronger (VIC members are better at finding overvalued stocks).
W7: Portfolio-level backtest Proves tradability
Monthly formation of top-decile LONG + bottom-decile SHORT portfolio. Compute Sharpe, drawdown, turnover with realistic constraints (slippage, borrow costs, position limits).
W6: Discovery timing Live pipeline prep
Measure empirical lag cost: compute alpha at T+0, T+1, T+3, T+7 and check if rankings change. Connect to moneygun event log (nvme/paper/o/) for real discovery timestamps.

Execution Order

Now Then After ┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ W2: Scale data │──────────▶│ Re-eval all │────────▶│ W7: Portfolio │ │ 25K ideas │ │ horizons w/ │ │ backtest │ │ all sectors │ │ embargo │ │ (proves tradable)│ │ ~$4 embed cost │ │ (365d testable!) │ └──────────────────┘ └─────────────────┘ └──────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌──────────────────┐ │ W4: Author │ │ W5: Direction │ │ track record │ │ split eval │ │ (needs data) │ │ (LONG vs SHORT) │ └─────────────────┘ └──────────────────┘ │ ▼ ┌─────────────────┐ │ W3: Survivorship│ │ bias fix │ │ (delisted cos) │ └─────────────────┘

CLI Reference

# Standard run (30d with embargo — the honest evaluation)
python -m finance.vic_analysis.predict_robust --no-fundamentals --horizon 30d

# Compare with/without embargo
python -m finance.vic_analysis.predict_robust --no-fundamentals --horizon 365d           # with embargo (honest)
python -m finance.vic_analysis.predict_robust --no-fundamentals --horizon 365d --no-embargo  # without (inflated)

# Nested PCA selection (slower, auto-picks dims)
python -m finance.vic_analysis.predict_robust --no-fundamentals --horizon 30d --auto-pca

# Full permutation test (~5 min, 100 shuffles)
python -m finance.vic_analysis.predict_robust --no-fundamentals --horizon 30d --permutation-test

# Single split mode (for quick V1-style comparison)
python -m finance.vic_analysis.predict_robust --no-fundamentals --single-split

Glossary

Spearman r: Rank correlation between predicted and actual alpha. Measures whether the model correctly orders ideas from worst to best, regardless of the exact predicted values. Range: -1 to +1.
Embargo gap: Time buffer between the last training idea's return window and the first test idea. Prevents target leakage from overlapping price observation periods.
Walk-forward CV: Cross-validation that respects time order: train on past data, test on future data. Expanding window: each fold adds more past data. Never trains on future data.
Q5-Q1 spread: Difference in actual returns between the model's top quintile (Q5, best predicted) and bottom quintile (Q1, worst predicted). Measures economic significance.
Block bootstrap: Statistical resampling that preserves temporal structure by resampling whole time blocks (quarters) rather than individual observations.
Permutation test: Significance test that shuffles the target variable within strata (year × sector) and re-runs the full pipeline. If the real result exceeds 95% of shuffled results, the signal is real.
PCA: Principal Component Analysis — reduces 1536-dimensional embeddings to ~100 meaningful dimensions. Fit on training data only to prevent leakage.
RidgeCV: Linear regression with L2 regularization. The "CV" means it automatically selects the regularization strength via internal leave-one-out cross-validation.
Alpha: Excess return vs benchmark (SPY). Alpha = stock return − beta × SPY return. Positive alpha means the stock beat the market after adjusting for systematic risk.

Generated 2026-03-01 | VIC Alpha V2 Project | finance/vic_analysis/