Predicting the Success of Startups through Machine Learning
Authors: ['Ettore Santi', 'Sandro Brunelli', 'Emiliano Di Carlo', 'Fabrizio Rossi']
Year: 2019
Methodology
- Sample: 442
- Design: cross-sectional
- Data: Crunchbase, LinkedIn
Factors Extracted (6)
Number of funding rounds [strong] — High importance in Random Forest feature ranking
Total funding amount [strong] — Significant predictor of 'acquired' or 'IPO' status
Founder's education (Top-tier university) [moderate] — Positive correlation with success
Number of founders (Team size) [moderate] — Optimal range identified (2-3 founders)
Time between funding rounds [moderate] — Shorter intervals correlate with higher success probability
Company age at first funding [weak] — Inverse relationship with long-term success
Key Findings
- The Random Forest algorithm achieved the highest predictive accuracy (approx. 85%) in classifying startup success compared to other models.
- The 'number of funding rounds' and 'total funding amount' are the most dominant predictors of a startup reaching an exit (IPO or Acquisition).
- The presence of founders with previous exit experience significantly increases the probability of the current startup's success.
Limitations
- Survival bias: The dataset primarily includes startups that received at least one round of seed funding.
- Data lag: Crunchbase data relies on self-reporting or news cycles, which may result in incomplete financial figures.
- Geographic bias: The sample is heavily weighted toward US-based startups, potentially limiting global generalizability.
Extracted by lib/ingest/literature_review.py via gemini-flash