Analytics background

Streamlit | Machine Learning Project

AI Startup Failure Prediction Dashboard

A Machine Learning System for Predicting Startup Failure Risk

Overview

Startup failure rates are notoriously high, and investors often struggle to identify which companies are most at risk early on.

To solve this, I built an end-to-end machine learning system that predicts the probability that a startup will fail (close) based on its funding, milestones, relationships, and company characteristics.

This project showcases:

  • Full data analysis & feature engineering
  • A trained machine learning model (Random Forest)
  • Portfolio-level insights & startup-level diagnostics
  • An interactive Streamlit web dashboard
  • Clean production-ready code & documentation

The result is a VC-style analytics tool that helps investors understand which startups are most likely to experience failure in the near future.

The Problem

Over 90% of startups fail, but the warning signs are rarely clear until it's too late.

Investors often rely on intuition, limited signals, or incomplete data.

Key Question:

Can historical company characteristics predict the likelihood of startup failure?

If we can identify key predictors early, founders, accelerators, and VCs can take action before failure occurs.

My Approach

I created a complete analytics pipeline consisting of:

1. Data Exploration & Preparation

I cleaned and prepared historical startup datasets, including:

  • Funding rounds
  • Total capital raised
  • Founder & company relationships
  • Milestones achieved
  • Top-500 status
  • Final outcome (success vs failure)

I handled missing values, engineered new features, and built a modeling-ready dataset.

2. Machine Learning Modeling

I tested several classification models, selecting Random Forest for its strong performance and interpretability.

Key outputs generated:

  • Predicted probability of failure
  • Feature importance rankings
  • Startup risk segmentation (Low / Medium / High)

3. Interactive Streamlit Dashboard

I designed and developed a fully-interactive dashboard that allows users to:

  • Analyze portfolio-level failure risk
  • Compare predicted vs actual failure rates
  • View distribution charts and risk clusters
  • Filter startups by state, region, or risk bucket
  • Inspect individual startup profiles, including:
    • • Funding history
    • • Milestones
    • • Predicted failure probability
    • • Risk classification
    • • Company relationships

This turns a static ML model into a decision-support product.

Features

Portfolio Risk Overview

High-level summary of startup risk across the entire dataset.

Risk Distribution Chart

Visualizes predicted failure probabilities.

Funding vs Risk Scatter Plot

Shows how capital raised correlates with failure risk.

Startup-Level Risk Explorer

Drill into any company and see its model-generated insights.

Full Data Transparency

Filtered, underlying data table included directly in the dashboard.

Dashboard Preview

Main Dashboard View

Interactive dashboard with portfolio risk overview, key metrics, and startup risk analysis.

AI Startup Failure Prediction Dashboard - Main View

Risk Distribution Chart

Visualizes the distribution of predicted failure probabilities across all startups.

AI Startup Dashboard - Risk Distribution Chart

Funding vs Risk Scatter Plot

Shows the relationship between capital raised and predicted failure risk.

AI Startup Dashboard - Funding vs Risk Scatter Plot

Risk Filter

Filter startups by risk bucket (Low / Medium / High) for targeted analysis.

AI Startup Dashboard - Risk Filter

State Filter

Filter startups by geographic location to analyze regional patterns.

AI Startup Dashboard - State Filter

Individual Startup View

Drill into any company to see detailed risk analysis and model-generated insights.

AI Startup Dashboard - Individual Startup View

Startup Data Details

View detailed funding history, milestones, and company characteristics.

AI Startup Dashboard - Startup Data Details

Underlying Data Table

Full transparency with the underlying dataset, filterable and exportable.

AI Startup Dashboard - Data Table

Results

The model identified meaningful patterns in startup failure. For example:

  • Startups with fewer milestones and low investor relationships showed significantly higher failure rates.
  • Higher funding did not always correlate with lower risk—several well-funded companies still exhibited high predicted failure probability.
  • Risk distribution revealed a concentrated cluster of medium-risk companies that investors may overlook.

These findings demonstrate how predictive analytics can supplement traditional due-diligence methods.

Tech Stack

Languages and Tools

  • Python
  • Pandas, NumPy
  • Scikit-learn (Random Forest)
  • Matplotlib, Seaborn
  • Streamlit (Dashboard)