
Streamlit | Machine Learning Project
A Machine Learning System for Predicting Startup Failure Risk
Startup failure rates are notoriously high, and investors often struggle to identify which companies are most at risk early on.
To solve this, I built an end-to-end machine learning system that predicts the probability that a startup will fail (close) based on its funding, milestones, relationships, and company characteristics.
This project showcases:
The result is a VC-style analytics tool that helps investors understand which startups are most likely to experience failure in the near future.
Over 90% of startups fail, but the warning signs are rarely clear until it's too late.
Investors often rely on intuition, limited signals, or incomplete data.
Key Question:
Can historical company characteristics predict the likelihood of startup failure?
If we can identify key predictors early, founders, accelerators, and VCs can take action before failure occurs.
I created a complete analytics pipeline consisting of:
I cleaned and prepared historical startup datasets, including:
I handled missing values, engineered new features, and built a modeling-ready dataset.
I tested several classification models, selecting Random Forest for its strong performance and interpretability.
Key outputs generated:
I designed and developed a fully-interactive dashboard that allows users to:
This turns a static ML model into a decision-support product.
High-level summary of startup risk across the entire dataset.
Visualizes predicted failure probabilities.
Shows how capital raised correlates with failure risk.
Drill into any company and see its model-generated insights.
Filtered, underlying data table included directly in the dashboard.
Interactive dashboard with portfolio risk overview, key metrics, and startup risk analysis.

Visualizes the distribution of predicted failure probabilities across all startups.

Shows the relationship between capital raised and predicted failure risk.

Filter startups by risk bucket (Low / Medium / High) for targeted analysis.

Filter startups by geographic location to analyze regional patterns.

Drill into any company to see detailed risk analysis and model-generated insights.

View detailed funding history, milestones, and company characteristics.

Full transparency with the underlying dataset, filterable and exportable.

The model identified meaningful patterns in startup failure. For example:
These findings demonstrate how predictive analytics can supplement traditional due-diligence methods.