Analytics background

Streamlit | Machine Learning Project

Customer Churn Prediction

Predicting Customer Attrition Using Machine Learning & Analytics

Overview

Customer churn is one of the most important metrics for subscription-based businesses. Retaining a customer is significantly cheaper than acquiring a new one, and understanding why customers leave can directly improve revenue and long-term growth.

In this project, I built a complete end-to-end churn prediction system using the Telco Customer Churn Dataset. The project includes exploratory data analysis, feature engineering, model development, and an interactive Streamlit dashboard that allows users to explore churn patterns and predict the churn probability for any customer profile.

This project demonstrates my ability to:

  • Analyze complex customer behavior data
  • Build predictive models that support business decisions
  • Design dashboards that communicate insights clearly
  • Deploy analytical applications for real-world use

The Problem

Telecom companies lose substantial revenue every year due to customer churn. Understanding what factors contribute to customer attrition—and which customers are most likely to churn—enables companies to take proactive action.

The key business questions:

  • What patterns differentiate customers who churn from those who stay?
  • Which customer characteristics are the strongest predictors of churn?
  • Can we build a model that accurately predicts whether a customer is at risk?
  • How can insights be visualized and used to support decision-making?

My Approach

1. Exploratory Data Analysis

I analyzed customer demographics, contract types, billing information, tenure, internet service types, and more to uncover patterns related to churn.

Key insights included:

  • Month-to-month customers churn at much higher rates
  • Customers with fiber-optic internet show higher churn
  • Short-tenure customers are the most vulnerable
  • Electronic check users churn more than users with automatic payment methods

Charts were used throughout to visualize churn distribution and segment-level differences.

2. Feature Engineering

To improve model performance and interpretability:

  • Created tenure groups ("0–12", "13–24", "25–48", "49+")
  • Encoded categorical variables
  • Cleaned and imputed missing values
  • Generated additional features related to billing behavior

3. Model Development

I built and compared multiple machine learning models, including:

  • Logistic Regression
  • Random Forest Classifier

The models were evaluated using accuracy, recall, F1 score, and ROC-AUC. Logistic Regression performed the best overall and offered strong interpretability—critical for business stakeholders.

4. Interactive Dashboard

I developed a Streamlit application with four main sections:

Overview

  • • Overall churn rate
  • • Month-to-month contract churn rate
  • • Average monthly charges
  • • Churn distribution

Segment Analysis

  • • Churn by tenure group
  • • Churn by internet service
  • • Churn by payment method

Model Insights

  • • Performance comparison between Logistic Regression and Random Forest
  • • Explanation of high-impact churn drivers

Predict Churn

  • • A live prediction tool where users can input customer details and instantly receive a churn probability.

This makes the project hands-on and business-ready.

Results

The models revealed significant drivers of churn, including:

  • Contract type (month-to-month customers are most likely to churn)
  • Tenure (lower tenure strongly correlates with churn)
  • Internet service type
  • Monthly charges and billing preferences

Performance Summary

  • Logistic Regression achieved ~79% accuracy and 0.83 ROC-AUC
  • Random Forest achieved ~78% accuracy and 0.82 ROC-AUC

The deployed app enables real-time churn risk scoring, dynamic exploration of customer segments, and clear insight into retention opportunities.

Dashboard Preview

Overview Dashboard

High-level view of key churn metrics including overall churn rate, month-to-month churn rate, average charges comparison, and churn distribution visualization.

Customer Churn Dashboard - Overview

Segment Analysis

Deep dive into churn patterns by customer segments including tenure groups, internet service type, and payment methods.

Customer Churn Dashboard - Segment Analysis

Model Insights

Side-by-side comparison of machine learning model performance with metrics including accuracy, recall, F1 score, and ROC-AUC.

Customer Churn Dashboard - Model Insights

Churn Risk Calculator

Interactive tool allowing users to input customer characteristics and instantly calculate churn probability based on the trained model.

Customer Churn Dashboard - Churn Calculator

Prediction Output

Real-time churn probability result displayed after submitting customer details, enabling immediate risk assessment.

Customer Churn Dashboard - Prediction Output

Tech Stack

Languages and Tools

  • Python
  • Pandas, NumPy, Scikit-Learn
  • Streamlit
  • Matplotlib, Seaborn
  • Jupyter Notebook
  • Joblib