Last updated 1 weeks ago•January 2, 2026
Time:May 2008 - November 2010
Location:Portugal
Created by Dataset Agent
Overview
The Bank Marketing Dataset is a comprehensive collection of direct marketing campaign data from a Portuguese banking institution. This dataset captures phone-based marketing efforts aimed at convincing clients to subscribe to term deposit products. With 41,188 client contacts and 21 distinct features, it provides rich insights into customer behavior, campaign effectiveness, and the socioeconomic factors that influence financial decision-making.
The dataset contains 41,188 client contacts from Portuguese bank marketing campaigns.
View Source
Overall subscription rate of 11.27% with 4,640 successful term deposit conversions.
View Source
This dataset has become a benchmark for binary classification problems in machine learning, particularly for predicting customer response to marketing campaigns. It's widely used in academic research, banking analytics, and as a teaching resource for data science courses.
Historical and Economic Context
This dataset was collected between May 2008 and November 2010, a period that coincided with one of the most significant financial crises in modern history. Portugal, as part of the Eurozone, experienced severe economic turbulence during this time, with rising unemployment, declining consumer confidence, and increasing pressure on the banking sector.
The Portuguese banking industry faced unique challenges during this period. Term deposits—fixed-term savings accounts offering guaranteed interest rates—became an attractive product for banks seeking to stabilize their funding base. For consumers facing economic uncertainty, these low-risk savings vehicles offered a safe haven compared to volatile investment alternatives. This context helps explain why the bank invested heavily in telemarketing campaigns despite the challenging economic environment.
The dataset's collection period (2008-2010) coincides with the global financial crisis. Economic conditions and consumer behavior patterns may differ significantly in other time periods, which should be considered when applying models trained on this data.
Dataset Composition
The dataset features a rich combination of client demographics, previous campaign interactions, and macroeconomic indicators that provide context for each marketing contact. Understanding the composition helps reveal who the bank was targeting and how those segments responded.
View Source
Administrative workers and blue-collar employees dominate the dataset, representing over 47% of all contacts combined. This reflects both the demographic composition of Portugal's workforce and the bank's targeting strategy—focusing on employed individuals with stable income sources who might have disposable income for savings.
Client ages range from 17 to 98 years with an average age of 40 years.
View Source
View Source
The education distribution shows that nearly 30% of contacts hold university degrees, while a significant portion completed only basic education (4, 6, or 9 years). The presence of 1,731 contacts with unknown education status and 18 illiterate individuals highlights data quality considerations that analysts should account for.
Key Insights: Who Subscribes?
Analysis reveals striking patterns in subscription behavior across different client segments. These insights have significant implications for marketing strategy and customer targeting.
Students and retired individuals show the highest subscription rates at 31.43% and 25.23% respectively—nearly 3x the overall average of 11.27%.
View Source
The high subscription rates among students and retirees likely reflect different motivations. Students may be establishing savings habits with parental guidance or managing educational funds, while retirees prioritize capital preservation and guaranteed returns over riskier investments—especially relevant during economic uncertainty.
View Source
Clients aged 60 and above show the highest subscription rate at 39.56%, nearly 5x higher than the 40-49 age group at 7.92%. This pattern suggests that retirement planning and financial security become more appealing to older demographics, while middle-aged clients—often managing mortgages, children's education, and other expenses—have less disposable income for term deposits.
Temporal Patterns in Campaign Success
Campaign timing significantly impacts success rates. The data reveals strong seasonal patterns that have important implications for marketing resource allocation.
View Source
March campaigns achieved the highest success rate at 50.55%, while May had the lowest at 6.43% despite having the most contacts (13,769).
View Source
The inverse relationship between contact volume and success rate is striking. High-volume months (May, July, August) show the lowest conversion rates, suggesting diminishing returns from aggressive outreach. The high success rates in March, December, September, and October may reflect fiscal year-end considerations, bonus season, or simply better-qualified leads during lower-volume periods.
Contact Method and Campaign History
The method of contact and prior campaign outcomes are among the strongest predictors of success in this dataset.
View Source
Cellular contacts achieve 14.74% subscription rate—nearly 3x higher than telephone contacts at 5.23%.
View Source
The dramatic difference between cellular and landline telephone success rates likely reflects demographic factors—cellular users tend to be younger, more tech-savvy, and more accessible. Landline users may include older demographics who are harder to reach or more resistant to telemarketing.
View Source
Clients with a successful previous campaign outcome have a 65.11% subscription rate—the single strongest predictor in the dataset and nearly 6x the overall average.
The power of previous success as a predictor underscores the importance of customer relationship management. Satisfied previous customers are dramatically more likely to engage again, suggesting that retention and re-engagement strategies may yield better ROI than cold outreach.
Macroeconomic Indicators
A distinguishing feature of this dataset is the inclusion of five social and economic context attributes that capture the macroeconomic environment during each contact. These features are particularly valuable for understanding how external economic conditions influence consumer financial decisions.
Economic Indicators Summary
| Indicator | Average Value | Description |
|---|---|---|
| Employment Variation Rate | 0.08% | Quarterly employment change |
| Consumer Price Index | 93.58 | Monthly consumer price indicator |
| Consumer Confidence Index | -40.50 | Monthly consumer confidence |
| Euribor 3-Month Rate | 3.62% | Euro interbank offered rate |
| Number Employed | 5167.04 | Quarterly employment figures (thousands) |
| 5 rows | ||
View Source
The negative consumer confidence index (average -40.50) reflects the pessimistic economic sentiment during the financial crisis. Research has shown that these macroeconomic features significantly improve prediction accuracy, as they capture systemic factors that influence consumer behavior beyond individual characteristics.
Data Quality Considerations
When working with this dataset, analysts should be aware of several data quality considerations that may affect modeling decisions.
The 'duration' feature should be excluded when building realistic predictive models. Call duration is only known after the call ends—at which point the outcome is also known. Including it constitutes data leakage and produces misleadingly optimistic model performance.
Several categorical fields contain 'unknown' values that require handling decisions: job (330 unknown), education (1,731 unknown), marital status, and loan status. The 'pdays' feature uses 999 as a sentinel value to indicate clients who were not previously contacted, which should be treated as a categorical indicator rather than a numeric value.
With only 11.27% positive cases, this dataset exhibits moderate class imbalance. Models trained without addressing this imbalance may achieve high accuracy by simply predicting the majority class, while failing to identify the valuable minority of subscribers. Appropriate evaluation metrics include F1-score, precision-recall AUC, and Matthews correlation coefficient rather than accuracy alone.
Academic Impact and Research Context
The data was originally compiled by researchers at the University of Minho in Portugal and published in the journal Decision Support Systems in 2014. Since its release, it has become one of the most cited datasets in marketing analytics and machine learning research, appearing in hundreds of academic papers.
The dataset serves as a standard benchmark for classification algorithms, feature engineering techniques, and methods for handling class imbalance. Its combination of categorical and numeric features, along with the included macroeconomic indicators, makes it particularly valuable for teaching and demonstrating real-world data science challenges.
Sample Data Preview
Sample Records from the Dataset
| # | Age | Job | Marital | Education | Contact | Month | Duration | Campaign | Subscribed |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 56 | Housemaid | Married | Basic 4y | Telephone | May | 261 | 1 | No |
| 2 | 57 | Services | Married | High School | Telephone | May | 149 | 1 | No |
| 3 | 37 | Services | Married | High School | Telephone | May | 226 | 1 | No |
| 4 | 40 | Admin | Married | Basic 6y | Telephone | May | 151 | 1 | No |
| 5 | 56 | Services | Married | High School | Telephone | May | 307 | 1 | No |
| 5 rows | |||||||||
View Source
Important Modeling Considerations
For realistic predictive models, exclude the 'duration' feature. Models built with duration will show inflated performance metrics that won't translate to real-world deployment where call duration is unknown at prediction time.
The 'pdays' value of 999 indicates no previous contact. Treat this as a categorical indicator (contacted vs. not contacted) rather than as a numeric distance measure.
Table Overview
finance
Data Preview
Scroll to see more| age | job | marital | education | default | housing | loan | contact | month | day_of_week | duration | campaign | pdays | previous | poutcome | emp.var.rate | cons.price.idx | cons.conf.idx | euribor3m | nr.employed | y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 56 | housemaid | married | basic.4y | no | no | no | telephone | may | mon | 261 | 1 | 999 | 0 | nonexistent | 1.1 | 93.99 | -36.4 | 4.86 | 5,191 | false |
| 57 | services | married | high.school | unknown | no | no | telephone | may | mon | 149 | 1 | 999 | 0 | nonexistent | 1.1 | 93.99 | -36.4 | 4.86 | 5,191 | false |
| 37 | services | married | high.school | no | yes | no | telephone | may | mon | 226 | 1 | 999 | 0 | nonexistent | 1.1 | 93.99 | -36.4 | 4.86 | 5,191 | false |
| 40 | admin. | married | basic.6y | no | no | no | telephone | may | mon | 151 | 1 | 999 | 0 | nonexistent | 1.1 | 93.99 | -36.4 | 4.86 | 5,191 | false |
| 56 | services | married | high.school | no | no | yes | telephone | may | mon | 307 | 1 | 999 | 0 | nonexistent | 1.1 | 93.99 | -36.4 | 4.86 | 5,191 | false |
Row 1
age56
jobhousemaid
maritalmarried
educationbasic.4y
defaultno
+16 more columns
Row 2
age57
jobservices
maritalmarried
educationhigh.school
defaultunknown
+16 more columns
Row 3
age37
jobservices
maritalmarried
educationhigh.school
defaultno
+16 more columns
Showing 5 of 41,188 rows
Data Profile
41,188
rows
21
columns
100%
complete
41.2 MB
estimated size
Column Types
10 Numeric10 Text1 Boolean
Data Dictionary
finance
| Column | Type | Example | Missing Values |
|---|---|---|---|
age | numeric | 56, 57 | 0 |
job | string | "housemaid", "services" | 0 |
marital | string | "married", "married" | 0 |
education | string | "basic.4y", "high.school" | 0 |
default | string | "no", "unknown" | 0 |
housing | string | "no", "no" | 0 |
loan | string | "no", "no" | 0 |
contact | string | "telephone", "telephone" | 0 |
month | string | "may", "may" | 0 |
day_of_week | string | "mon", "mon" | 0 |
duration | numeric | 261, 149 | 0 |
campaign | numeric | 1, 1 | 0 |
pdays | numeric | 999, 999 | 0 |
previous | numeric | 0, 0 | 0 |
poutcome | string | "nonexistent", "nonexistent" | 0 |
emp.var.rate | numeric | 1.1, 1.1 | 0 |
cons.price.idx | numeric | 93.994, 93.994 | 0 |
cons.conf.idx | numeric | -36.4, -36.4 | 0 |
euribor3m | numeric | 4.857, 4.857 | 0 |
nr.employed | numeric | 5191, 5191 | 0 |
y | boolean | false, false | 0 |