Last updated 2 months ago•December 27, 2025
Time:Point-in-time snapshot (synthetic data)
Location:Not applicable (synthetic data)
Created by Dataset Agent
Overview
The Telco Customer Churn Dataset is an industry-standard benchmark dataset created by IBM for customer analytics demonstrations. It contains telecommunications customer data designed for analyzing and predicting customer attrition—when customers stop doing business with a company. This dataset has become one of the most widely-used resources for learning churn prediction, with thousands of notebooks and tutorials built around it.
Why Churn Matters: Acquiring new customers costs 5-25x more than retaining existing ones. With industry-average telco churn rates of 15-25% annually, even small improvements in retention can significantly impact profitability.
The dataset contains 5,000 customer records with 21 features covering demographics, services, and billing information.
View Source
The overall churn rate is 18.5%, with 925 customers having left the service—representing a moderately imbalanced classification problem.
View Source
Feature Categories
The 21 features are organized into four logical categories that capture different aspects of the customer relationship:
- Demographics: Gender, SeniorCitizen, Partner, Dependents — customer personal characteristics
- Services: PhoneService, MultipleLines, InternetService, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies — subscription details
- Account: CustomerID, Tenure, Contract, PaperlessBilling, PaymentMethod, MonthlyCharges, TotalCharges — billing and account information
- Target Variable: Churn — whether the customer left within the last month (Yes/No)
Key Insights and Churn Drivers
Analysis reveals several critical factors influencing customer churn. Contract type emerges as the single strongest predictor of customer retention.
View Source
Critical Finding: Month-to-month contracts show a 35.5% churn rate—more than 8x higher than two-year contracts (4.27%). This is the strongest predictor of churn in the dataset and a key target for retention strategies.
Customer Tenure and Value Analysis
Customers have an average tenure of 36.9 months (ranging from 1 to 72 months), with average monthly charges of $69.85 and customer lifetime value averaging $2,586.79.
View Source
Tenure shows a strong inverse relationship with churn—newer customers are significantly more likely to leave:
View Source
Demographic Patterns
Customer demographics reveal nuanced patterns in churn behavior. Senior citizens and customers without family connections show elevated churn rates.
View Source
View Source
View Source
Service and Billing Analysis
The dataset includes detailed service subscription information. Internet service type and payment methods show distinct patterns:
View Source
View Source
Model Performance Benchmarks
This dataset is well-suited for binary classification. Based on published results and common implementations, here are typical performance benchmarks achievable:
Class Imbalance Note: The 18.5% churn rate creates moderate class imbalance. Consider techniques like SMOTE, class weighting, or threshold adjustment to optimize for your business objective (e.g., prioritizing recall to catch more churners).
Feature Engineering Suggestions
To improve model performance beyond baseline, consider these feature engineering approaches specific to this dataset:
- Total Services Count: Sum of all service subscriptions (PhoneService, InternetService add-ons) — more services often correlates with lower churn
- Tenure Buckets: Convert continuous tenure to categorical (new: 0-12mo, established: 13-36mo, loyal: 37+ mo)
- Average Monthly Spend: TotalCharges / tenure — normalizes for customer lifetime
- Contract Risk Score: Combine contract type with tenure (month-to-month + low tenure = highest risk)
- Service Bundle Flags: Create binary flags for common service combinations
- Charge Increase Proxy: MonthlyCharges vs (TotalCharges/tenure) — may indicate recent price changes
Data Quality Notes
Known Issue: The TotalCharges column may contain blank/whitespace values for new customers with tenure of 0 months. Convert to numeric type and handle these as 0 or null values during preprocessing.
- - [x] CustomerID is unique — suitable as index
- - [x] No duplicate records in the dataset
- - [x] Churn is cleanly encoded as boolean (Yes/No)
- - [ ] TotalCharges requires type conversion from string to numeric
- - [ ] Some service columns have "No internet service" values that need handling
Sample Data Preview
Sample Customer Records
| # | Customer ID | Gender | Senior | Tenure | Contract | Monthly Charges | Churn |
|---|---|---|---|---|---|---|---|
| 1 | CUST-00000 | Female | Yes | 28 | Two year | $112.17 | No |
| 2 | CUST-00001 | Female | Yes | 39 | Month-to-month | $33.65 | No |
| 3 | CUST-00002 | Female | No | 32 | Two year | $112.13 | No |
| 4 | CUST-00003 | Female | Yes | 53 | One year | $89.38 | No |
| 5 | CUST-00004 | Male | No | 7 | One year | $38.16 | No |
| 5 rows | |||||||
View Source
Dataset Versions
Multiple versions of this IBM dataset exist on Kaggle with different feature sets:
Limitations and Considerations
While excellent for learning and benchmarking, this dataset has important limitations to consider:
- Synthetic Data: Created by IBM for educational purposes — patterns may not reflect real-world complexity
- No Temporal Dimension: Point-in-time snapshot without time-series showing when customers churned
- Limited Features: Real churn analysis often includes call detail records, customer service interactions, network quality metrics, and competitive offers
- No Causal Information: Cannot determine why customers churned (price, service quality, competitor offers, life changes)
- Single Geography: No regional variation that would exist in real telco data
- Clean Data: Unrealistically clean compared to production data — good for learning, less realistic for production prep
This synthetic dataset was created by IBM for educational and demonstration purposes. While it reflects realistic patterns found in telecommunications data, it should not be used to draw conclusions about any specific real-world company or market.
Table Overview
customers
Data Preview
Scroll to see more| customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CUST-00000 | Female | 1 | false | false | 28 | true | true | DSL | false | true | false | false | false | true | Two year | true | Bank transfer (automatic) | 112.17 | 3,140.76 | false |
| CUST-00001 | Female | 1 | false | true | 39 | false | true | Fiber optic | true | true | true | false | false | true | Month-to-month | true | Mailed check | 33.65 | 1,312.35 | false |
| CUST-00002 | Female | 0 | true | true | 32 | false | true | No | true | true | false | true | false | false | Two year | false | Electronic check | 112.13 | 3,588.16 | false |
| CUST-00003 | Female | 1 | false | true | 53 | true | false | No | false | true | false | true | true | true | One year | false | Bank transfer (automatic) | 89.38 | 4,737.14 | false |
| CUST-00004 | Male | 0 | false | true | 7 | true | true | No | true | false | false | false | false | true | One year | true | Credit card (automatic) | 38.16 | 267.12 | false |
Row 1
customerIDCUST-00000
genderFemale
SeniorCitizen1
Partnerfalse
Dependentsfalse
+16 more columns
Row 2
customerIDCUST-00001
genderFemale
SeniorCitizen1
Partnerfalse
Dependentstrue
+16 more columns
Row 3
customerIDCUST-00002
genderFemale
SeniorCitizen0
Partnertrue
Dependentstrue
+16 more columns
Showing 5 of 5,000 rows
Data Profile
5,000
rows
21
columns
100%
complete
5.0 MB
estimated size
Column Types
4 Numeric5 Text12 Boolean
High-Cardinality Columns
Columns with many unique values (suitable for identifiers or categorical features)
- customerID(5,000 unique values)
- TotalCharges(4,938 unique values)
- MonthlyCharges(3,934 unique values)
Data Dictionary
customers
| Column | Type | Example | Missing Values |
|---|---|---|---|
customerID | string | "CUST-00000", "CUST-00001" | 0 |
gender | string | "Female", "Female" | 0 |
SeniorCitizen | numeric | 1, 1 | 0 |
Partner | boolean | false, false | 0 |
Dependents | boolean | false, true | 0 |
tenure | numeric | 28, 39 | 0 |
PhoneService | boolean | true, false | 0 |
MultipleLines | boolean | true, true | 0 |
InternetService | string | "DSL", "Fiber optic" | 0 |
OnlineSecurity | boolean | false, true | 0 |
OnlineBackup | boolean | true, true | 0 |
DeviceProtection | boolean | false, true | 0 |
TechSupport | boolean | false, false | 0 |
StreamingTV | boolean | false, false | 0 |
StreamingMovies | boolean | true, true | 0 |
Contract | string | "Two year", "Month-to-month" | 0 |
PaperlessBilling | boolean | true, true | 0 |
PaymentMethod | string | "Bank transfer (autom...", "Mailed check" | 0 |
MonthlyCharges | numeric | 112.17, 33.65 | 0 |
TotalCharges | numeric | 3140.76, 1312.35 | 0 |
Churn | boolean | false, false | 0 |