Predicting the Success of Startups through Machine Learning

Authors: ['Ettore Santi', 'Sandro Brunelli', 'Emiliano Di Carlo', 'Fabrizio Rossi']

Year: 2019

Methodology

Number of funding rounds [strong] — High importance in Random Forest feature ranking

Total funding amount [strong] — Significant predictor of 'acquired' or 'IPO' status

Founder's education (Top-tier university) [moderate] — Positive correlation with success

Number of founders (Team size) [moderate] — Optimal range identified (2-3 founders)

Time between funding rounds [moderate] — Shorter intervals correlate with higher success probability

Company age at first funding [weak] — Inverse relationship with long-term success

The Random Forest algorithm achieved the highest predictive accuracy (approx. 85%) in classifying startup success compared to other models.
The 'number of funding rounds' and 'total funding amount' are the most dominant predictors of a startup reaching an exit (IPO or Acquisition).
The presence of founders with previous exit experience significantly increases the probability of the current startup's success.

Survival bias: The dataset primarily includes startups that received at least one round of seed funding.
Data lag: Crunchbase data relies on self-reporting or news cycles, which may result in incomplete financial figures.
Geographic bias: The sample is heavily weighted toward US-based startups, potentially limiting global generalizability.

Extracted by lib/ingest/literature_review.py via gemini-flash