A Research-Backed Scoring Framework — 2026-03-09 · 36 sources surveyed (21 academic, 6 VC, 5 industry, 1 book, 3 tools) · 28 deep-analyzed · 4 frontier models consulted
Six dimensions, weighted by research-validated predictive power and constrained to what's measurable from public data:
| # | Dimension | Weight | Measurability | Research Basis |
|---|---|---|---|---|
| 1 | Founder-Market Fit | 25% | High | Kauffman KFS, Azoulay et al. 2020, Sequoia/a16z/YC consensus |
| 2 | Prior Operational Evidence | 20% | High | Gompers et al. 2010 (14K ventures), First Round 10-Year (300 cos) |
| 3 | Team Quality | 20% | Medium-High | First Round (+163% for pairs), Torssell 2022 (25K ventures) |
| 4 | Network Position | 15% | Medium | Bonaventura et al. 2020 (41K cos, Nature), Burt 2004 |
| 5 | Technical Breadth | 10% | High | Lazear 2005 ("jack of all trades"), sector-dependent |
| 6 | Leadership Magnetism | 10% | Medium | a16z CEO framework, early hire quality as proxy |
Common VC intuitions like grit, academic pedigree, and raw intelligence were deliberately excluded. Grit reduces to conscientiousness when properly measured (Credé 2017, 88-study meta-analysis). Academic pedigree has rPearson correlation coefficient. Ranges from -1 to +1. In social science, r=0.10 is small, r=0.30 is medium, r=0.50 is large. An r of 0.05–0.15 means the variable explains less than 2% of outcome variance.=0.05–0.15 with outcomes and introduces systematic bias (Ewens & Townsend 2020). We kept only dimensions with robust evidence and public-data measurability.
Each card presents: what the research says matters (the ideal signal), not how we currently measure it. Data sources and measurement strategy are a separate concern — see the implementation plan.
How deeply does this person know the specific market they're attacking? The Kauffman Foundation's longitudinal study of 5,000 firms found prior industry experience to be the strongest predictor of firm survival. Azoulay et al. 2020 analyzed 2.7 million founders using US Census + IRS data and found the average age of a top-growth founder is 45 — because domain expertise compounds with time. A 50-year-old is 1.8x more likely to build a top-growth firm than a 30-year-old.
Every major VC framework now weights this as a primary signal: Sequoia calls it "clarity of thought about the market," a16z calls it "proprietary insight," YC asks "do you understand your users deeply?"
Ideal signal: Years in target industry, domain-specific roles, patents/publications in the target domain, prior companies in the same vertical, depth of mental model of their customer (Opus identified this as the single biggest gap in the literature).
Key design choice: This dimension is relational (person × market), not absolute. The same person scores differently for a fintech startup vs a biotech startup. When the target market is unknown, we fall back to scoring general domain depth.
Gompers et al. 2010 tracked 14,000 VC-backed entrepreneurs and found founders with a prior successful exit have a 30% chance of success in their next venture, vs 22% for first-timers. Importantly, prior failure does not predict future failure (23% vs 22%) — so this should be scored asymmetrically: reward success, don't penalize failure.
The First Round 10-Year Project (300 companies, ~600 founders) found repeat founders achieve +50% higher valuations, and Big Tech alumni outperform by +160%.
Ideal signal: Prior founding outcomes (granular: acquisition price, IPO, revenue milestones — not binary success/fail), largest operational scope managed (team size, revenue), zero-to-one building experience. Serial entrepreneurship status (18–30% of EU entrepreneurs are serial; they outperform).
Built-in temporal decay: A 2010 exit in ad-tech carries less signal than a 2023 exit in AI infrastructure. Recency and market adjacency should be weighted.
First Round Capital found 2-person founding teams outperform solo founders by +163%. A GoingVC analysis found that 65% of high-growth startups that fail do so from management team dysfunction, not market or product failure. A study of 25,430 European ventures confirmed team attributes as a top-tier predictor. Torssell found the optimum is a 50/50 mix of founders and hired executives, not all-founder teams.
Ideal signal: Number of co-founders, complementary skill coverage (technical + business + domain), prior co-founding or co-working history (Roure & Maidique 1986: joint experience was a primary differentiator), early hire caliber, gender diversity (+63% per First Round).
Bonaventura et al. 2020 (Nature Scientific Reports) analyzed 41,830 companies across 117 countries over 26 years. Their key finding: network centralityA measure of how well-connected a node is within a network. High centrality = connected to many important people, bridges between clusters. Computed via PageRank, betweenness, or eigenvector centrality. — a startup's position in the global talent-flow network — doubles the baseline VC success rate. Top-20 firms by centrality had ~30% success rates vs ~15% baseline.
This is not about name-dropping or LinkedIn connection counts. It's about structural position: does talent flow through this person's network? Are they connected to knowledge hubs?
Ideal signal: Closeness/betweenness centrality in co-employment networks, board seats, co-investor graphs, advisor network quality, employee flow patterns. The Nature study used proprietary co-employment data — public LinkedIn data is a proxy, not the real signal.
Lazear 2005 proposed the "jack of all trades" hypothesis: entrepreneurs with broader skill profiles outperform specialists. First Round's data confirmed that technical co-founders yield +230% in enterprise — but -31% in consumer. Pure technical depth is sector-dependent; breadth is more universally predictive.
Ideal signal: Range of functional roles held, variety of technical domains (GitHub breadth, patent diversity, publications across fields), ability to evaluate talent across functions (a16z's "functional literacy"). Sector-weighted: higher for deep-tech/infra, lower for consumer/marketplace.
Central to the a16z CEO evaluation framework: "Can the CEO get the company to do what needs to be done?" Ben Horowitz weights this "white box" evaluation (process and team-building ability) above "black box" results. Great leaders attract disproportionately strong early teams — the quality of first hires is a measurable proxy for leadership that's hard to fake.
Ideal signal: Caliber of early employees attracted (prior exits, elite backgrounds), team growth rate, public thought leadership (talks, writing, community building), speed of decision-making, "constructive confrontation" culture (Horowitz). This is the hardest dimension to quantify from public data, hence the 10% weight. GPT-Pro flagged integrity/epistemic honesty as a related missing signal.
Our framework was validated against the stated evaluation criteria of the four most successful VC firms. Every firm weights founder-market fit as a primary signal; none weight academic credentials:
| Firm | Primary Signal | Secondary Signal | Distinctive Take |
|---|---|---|---|
| Sequoia | Founder-market fit, clarity of thought | Evidence of velocity | "Fanatic dedication to product and customers" (Moritz) |
| a16z | Technical insight + courage of conviction | Storytelling / recruiting ability | Evaluate how you think (process), not just outcomes |
| Benchmark | Product intuition, intellectual honesty | Capital efficiency mindset | "Bet on the jockey" — conviction over process |
| YC | Determination (≠ stubbornness), execution speed | Deep user understanding | "What have you done since applying?" — the delta, not the snapshot |
Our 6 dimensions cover the observable signals underlying each firm's framework. Founder-market fit (25%) captures Sequoia's "clarity of thought" and a16z's "proprietary insight." Prior operational evidence (20%) captures Benchmark's "jockey" bet. Team quality (20%) captures the team dysfunction signal all four flag. Leadership magnetism (10%) captures a16z's "can they recruit?" The dimensions these firms can't articulate or measure — resilience, coachability, vision — are deliberately excluded because they require interaction, not data.
Scoring on unobservable traits would produce confident-sounding noise. The system is designed to score only what public data can reliably measure, then surface the top 20–40% for human evaluation of what it can't:
Academic pedigree is excluded as a standalone dimension. Tamaseb's analysis of 30,000 startups shows Ivy League attendance doesn't correlate with billion-dollar outcomes. Ewens & Townsend 2020 documents systematic investor bias toward elite credentials. Academic background contributes signal only when it's in the target domain — and that's already captured by founder-market fit.
We gave all 28 source extractions (176 factors) to 4 frontier AI models independently via vario maxthink — Opus 4.6, GPT-5.4-Pro, Grok 4.1 Reasoning, and Gemini — and asked each: "What 6 dimensions would you recommend for an automated scoring system using only publicly observable data?"
| Dimension | Opus | GPT-Pro | Grok | Gemini | Mean |
|---|---|---|---|---|---|
| Founder Human Capital / Market Fit | 25% | 25% | 25% | 20% | 24% |
| Team Composition & Completeness | 20% | 20% | — | 25% | 22% |
| Network / Ecosystem Position | 18% | 10% | 15% | 25% | 17% |
| Funding / Financial Trajectory | 15% | 10% | — | 15% | 13% |
| Market / Product Positioning | 12% | 15% | — | — | 14% |
| Behavioral / Adaptability Signals | 10% | 20% | — | 10% | 13% |
| Education Pedigree (standalone) | — | — | 15% | 5% | 5% |
Dashes indicate the model folded that signal into another dimension rather than listing it separately. Grok kept education pedigree as standalone; all others demoted or excluded it.
Three channels, run concurrently using lib/ingest/literature_review.py:
Related-work survey via Serper API generated 36 candidate sources. Each was fetched with global semaphore (10 concurrent) + per-domain semaphore (2 per host). 26 fetched successfully, 9 paywalled (abstract-only), 1 failed. Each fetched source was analyzed via gemini-flash to extract structured records: title, authors, year, type, methodology, factors (with measures and strength), key findings, and limitations. The 9 paywalled sources still contributed metadata and abstract-level findings where available.
Each factor was tagged with strength level (strong/moderate/weak/anecdotal) and quantitative measure where available. Factor-dimension mapping: Founder-Market Fit (38 factors), Leadership Magnetism (15), Team Quality (11), Technical Breadth (10), Network Position (7), Prior Operational Evidence (4), Other (91 — including personality traits, financial metrics, and hiring process factors).
All 28 source extractions + 176 factors were given to 4 frontier models via vario maxthink. Each model independently proposed a 6-dimension scoring framework with weights. Convergences (all 4 agree) taken as high-confidence; divergences flagged. The synthesis above reflects the cross-model consensus.
| Type | Count | Fetched | With Findings | Example sources |
|---|---|---|---|---|
| Academic papers | 21 | 14 | 14 | Gompers 2010, Bonaventura 2020, Torssell 2022, Pasayat 2023 |
| VC practitioner | 6 | 6 | 6 | First Round 10-Year, a16z CEO Framework, Sequoia podcast |
| Industry reports | 5 | 5 | 5 | GoingVC, VCII Scorecard, QuicklyHire, TalentHub |
| Books | 1 | 1 | 1 | The Resilient Founder (Ramsinghani 2021) |
| Tools | 3 | 3 | 2 | PitchLense, Startup Analyzer MVP, Launch Checklist |
All 36 sources surveyed. ✅ = full text fetched and analyzed. 🔒 = paywalled (abstract only). ❌ = failed to fetch.