Agents for Data
Skip to main content

Telco Customer Churn Dataset

IBM Telco customer churn dataset with 5,000 customers, 21 features, and 18.5% churn rate. Industry-standard benchmark for churn prediction models and customer retention analysis.

customer-churntelecommunicationsmachine-learningclassificationcustomer-analyticspredictive-modelingretentionibm-datasetbinary-classificationbenchmark-dataset1 table5,000 rows
Last updated 2 months agoDecember 27, 2025
Time:Point-in-time snapshot (synthetic data)
Location:Not applicable (synthetic data)
Created by Dataset Agent

Overview

The Telco Customer Churn Dataset is an industry-standard benchmark dataset created by IBM for customer analytics demonstrations. It contains telecommunications customer data designed for analyzing and predicting customer attrition—when customers stop doing business with a company. This dataset has become one of the most widely-used resources for learning churn prediction, with thousands of notebooks and tutorials built around it.
Why Churn Matters: Acquiring new customers costs 5-25x more than retaining existing ones. With industry-average telco churn rates of 15-25% annually, even small improvements in retention can significantly impact profitability.
The dataset contains 5,000 customer records with 21 features covering demographics, services, and billing information.
View Source
SQL
SELECT COUNT(*) AS total_customers FROM customers.csv
Data
Total Customers
5,000
1 row
The overall churn rate is 18.5%, with 925 customers having left the service—representing a moderately imbalanced classification problem.
View Source
SQL
SELECT COUNT(*) AS total_customers, SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) AS churned_customers, ROUND( SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS churn_rate FROM customers.csv
Data
Total CustomersChurned CustomersChurn Rate
5,00092518.5
1 row

Feature Categories

The 21 features are organized into four logical categories that capture different aspects of the customer relationship:
  • Demographics: Gender, SeniorCitizen, Partner, Dependents — customer personal characteristics
  • Services: PhoneService, MultipleLines, InternetService, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies — subscription details
  • Account: CustomerID, Tenure, Contract, PaperlessBilling, PaymentMethod, MonthlyCharges, TotalCharges — billing and account information
  • Target Variable: Churn — whether the customer left within the last month (Yes/No)

Key Insights and Churn Drivers

Analysis reveals several critical factors influencing customer churn. Contract type emerges as the single strongest predictor of customer retention.
View Source
SQL
SELECT Contract, COUNT(*) AS customer_count, SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) AS churned, ROUND( SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS churn_rate FROM customers.csv GROUP BY Contract ORDER BY churn_rate DESC
Data
Contract TypeCustomersChurnedChurn Rate (%)
Month-to-month1,60356935.5
One year1,68728316.78
Two year1,710734.27
3 rows
Critical Finding: Month-to-month contracts show a 35.5% churn rate—more than 8x higher than two-year contracts (4.27%). This is the strongest predictor of churn in the dataset and a key target for retention strategies.

Customer Tenure and Value Analysis

Customers have an average tenure of 36.9 months (ranging from 1 to 72 months), with average monthly charges of $69.85 and customer lifetime value averaging $2,586.79.
View Source
SQL
SELECT ROUND(AVG(tenure), 1) AS avg_tenure, MIN(tenure) AS min_tenure, MAX(tenure) AS max_tenure, ROUND(AVG(MonthlyCharges), 2) AS avg_monthly_charges, ROUND(AVG(TotalCharges), 2) AS avg_total_charges FROM customers.csv
Data
Avg TenureMin TenureMax TenureAvg Monthly ChargesAvg Total Charges
36.917269.852,586.79
1 row
Tenure shows a strong inverse relationship with churn—newer customers are significantly more likely to leave:
View Source
SQL
SELECT CASE WHEN tenure <= 12 THEN '0-12 months' WHEN tenure <= 24 THEN '13-24 months' WHEN tenure <= 48 THEN '25-48 months' ELSE '49-72 months' END AS tenure_bucket, COUNT(*) AS customers, SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) AS churned, ROUND( SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS churn_rate FROM customers.csv GROUP BY tenure_bucket ORDER BY MIN(tenure)
Data
Tenure RangeCustomersChurnedChurn Rate (%)
0-12 months1,52541227.02
13-24 months82317821.63
25-48 months1,10218917.15
49-72 months1,5501469.42
4 rows

Demographic Patterns

Customer demographics reveal nuanced patterns in churn behavior. Senior citizens and customers without family connections show elevated churn rates.
View Source
SQL
SELECT CASE WHEN SeniorCitizen = 1 THEN 'Senior Citizen' ELSE 'Non-Senior' END AS segment, COUNT(*) AS customer_count, SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) AS churned, ROUND( SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS churn_rate FROM customers.csv GROUP BY SeniorCitizen
Data
SegmentCustomersChurnedChurn Rate (%)
Non-Senior4,01672418.03
Senior Citizen98420120.43
2 rows
View Source
SQL
SELECT Partner, Dependents, COUNT(*) AS customer_count, SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) AS churned, ROUND( SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS churn_rate FROM customers.csv GROUP BY Partner, Dependents ORDER BY churn_rate DESC
Data
PartnerDependentsCustomersChurn Rate (%)
NoNo1,24219.65
YesNo1,27718.79
NoYes1,24917.93
YesYes1,23217.61
4 rows
View Source
SQL
SELECT gender, COUNT(*) AS customer_count FROM customers.csv GROUP BY gender
Data
GenderCount
Female2,555
Male2,445
2 rows

Service and Billing Analysis

The dataset includes detailed service subscription information. Internet service type and payment methods show distinct patterns:
View Source
SQL
SELECT InternetService, COUNT(*) AS customer_count, SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) AS churned, ROUND( SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS churn_rate FROM customers.csv GROUP BY InternetService ORDER BY churn_rate DESC
Data
Internet ServiceCustomersChurnedChurn Rate (%)
DSL1,64632719.87
Fiber optic1,71331018.1
No Internet1,64128817.55
3 rows
View Source
SQL
SELECT PaymentMethod, COUNT(*) AS customer_count, SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) AS churned, ROUND( SUM( CASE WHEN Churn = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS churn_rate FROM customers.csv GROUP BY PaymentMethod ORDER BY churn_rate DESC
Data
Payment MethodCustomersChurnedChurn Rate (%)
Credit card (automatic)1,24123919.26
Mailed check1,26923818.75
Electronic check1,22923018.71
Bank transfer (automatic)1,26121817.29
4 rows

Model Performance Benchmarks

This dataset is well-suited for binary classification. Based on published results and common implementations, here are typical performance benchmarks achievable:
Class Imbalance Note: The 18.5% churn rate creates moderate class imbalance. Consider techniques like SMOTE, class weighting, or threshold adjustment to optimize for your business objective (e.g., prioritizing recall to catch more churners).

Feature Engineering Suggestions

To improve model performance beyond baseline, consider these feature engineering approaches specific to this dataset:
  • Total Services Count: Sum of all service subscriptions (PhoneService, InternetService add-ons) — more services often correlates with lower churn
  • Tenure Buckets: Convert continuous tenure to categorical (new: 0-12mo, established: 13-36mo, loyal: 37+ mo)
  • Average Monthly Spend: TotalCharges / tenure — normalizes for customer lifetime
  • Contract Risk Score: Combine contract type with tenure (month-to-month + low tenure = highest risk)
  • Service Bundle Flags: Create binary flags for common service combinations
  • Charge Increase Proxy: MonthlyCharges vs (TotalCharges/tenure) — may indicate recent price changes

Data Quality Notes

Known Issue: The TotalCharges column may contain blank/whitespace values for new customers with tenure of 0 months. Convert to numeric type and handle these as 0 or null values during preprocessing.
  • - [x] CustomerID is unique — suitable as index
  • - [x] No duplicate records in the dataset
  • - [x] Churn is cleanly encoded as boolean (Yes/No)
  • - [ ] TotalCharges requires type conversion from string to numeric
  • - [ ] Some service columns have "No internet service" values that need handling

Sample Data Preview

Sample Customer Records
#Customer IDGenderSeniorTenureContractMonthly ChargesChurn
1CUST-00000FemaleYes28Two year$112.17No
2CUST-00001FemaleYes39Month-to-month$33.65No
3CUST-00002FemaleNo32Two year$112.13No
4CUST-00003FemaleYes53One year$89.38No
5CUST-00004MaleNo7One year$38.16No
5 rows
View Source
SQL
SELECT customerID, gender, SeniorCitizen, tenure, Contract, MonthlyCharges, Churn FROM customers.csv LIMIT 5
Data
Customer IDGenderSeniorTenureContractMonthly ChargesChurn
CUST-00000FemaleYes28Two year$112.17No
CUST-00001FemaleYes39Month-to-month$33.65No
CUST-00002FemaleNo32Two year$112.13No
CUST-00003FemaleYes53One year$89.38No
CUST-00004MaleNo7One year$38.16No
5 rows

Dataset Versions

Multiple versions of this IBM dataset exist on Kaggle with different feature sets:

Limitations and Considerations

While excellent for learning and benchmarking, this dataset has important limitations to consider:
  • Synthetic Data: Created by IBM for educational purposes — patterns may not reflect real-world complexity
  • No Temporal Dimension: Point-in-time snapshot without time-series showing when customers churned
  • Limited Features: Real churn analysis often includes call detail records, customer service interactions, network quality metrics, and competitive offers
  • No Causal Information: Cannot determine why customers churned (price, service quality, competitor offers, life changes)
  • Single Geography: No regional variation that would exist in real telco data
  • Clean Data: Unrealistically clean compared to production data — good for learning, less realistic for production prep
This synthetic dataset was created by IBM for educational and demonstration purposes. While it reflects realistic patterns found in telecommunications data, it should not be used to draw conclusions about any specific real-world company or market.

Table Overview

customers

Contains 5,000 rows and 21 columns. Column types: 4 numeric, 5 text, 12 boolean.

5,000 rows21 columns

customers

5,000
rows
21
columns

Data Preview

Scroll to see more
Row 1
customerIDCUST-00000
genderFemale
SeniorCitizen1
Partnerfalse
Dependentsfalse
+16 more columns
Row 2
customerIDCUST-00001
genderFemale
SeniorCitizen1
Partnerfalse
Dependentstrue
+16 more columns
Row 3
customerIDCUST-00002
genderFemale
SeniorCitizen0
Partnertrue
Dependentstrue
+16 more columns

Data Profile

5,000
rows
21
columns
100%
complete
5.0 MB
estimated size

Column Types

4 Numeric5 Text12 Boolean

High-Cardinality Columns

Columns with many unique values (suitable for identifiers or categorical features)

  • customerID(5,000 unique values)
  • TotalCharges(4,938 unique values)
  • MonthlyCharges(3,934 unique values)

Data Dictionary

customers

ColumnTypeExampleMissing Values
customerIDstring"CUST-00000", "CUST-00001"0
genderstring"Female", "Female"0
SeniorCitizennumeric1, 10
Partnerbooleanfalse, false0
Dependentsbooleanfalse, true0
tenurenumeric28, 390
PhoneServicebooleantrue, false0
MultipleLinesbooleantrue, true0
InternetServicestring"DSL", "Fiber optic"0
OnlineSecuritybooleanfalse, true0
OnlineBackupbooleantrue, true0
DeviceProtectionbooleanfalse, true0
TechSupportbooleanfalse, false0
StreamingTVbooleanfalse, false0
StreamingMoviesbooleantrue, true0
Contractstring"Two year", "Month-to-month"0
PaperlessBillingbooleantrue, true0
PaymentMethodstring"Bank transfer (autom...", "Mailed check"0
MonthlyChargesnumeric112.17, 33.650
TotalChargesnumeric3140.76, 1312.350
Churnbooleanfalse, false0
Last updated: December 27, 2025
Created: December 26, 2025