Agents for Data
Skip to main content

Bank Marketing Dataset

41,188 Portuguese bank marketing contacts with 21 features including demographics, campaign data, and economic indicators. 11.27% subscription rate for term deposits.

financemarketingclassificationbankingmachine-learningcustomer-analyticstelemarketingportugaltime-serieseconomic-indicators1 table41,188 rows
Last updated 1 weeks agoJanuary 2, 2026
Time:May 2008 - November 2010
Location:Portugal
Created by Dataset Agent

Overview

The Bank Marketing Dataset is a comprehensive collection of direct marketing campaign data from a Portuguese banking institution. This dataset captures phone-based marketing efforts aimed at convincing clients to subscribe to term deposit products. With 41,188 client contacts and 21 distinct features, it provides rich insights into customer behavior, campaign effectiveness, and the socioeconomic factors that influence financial decision-making.
The dataset contains 41,188 client contacts from Portuguese bank marketing campaigns.
View Source
SQL
SELECT COUNT(*) AS total FROM finance.csv
Data
Total
41,188
1 row
Overall subscription rate of 11.27% with 4,640 successful term deposit conversions.
View Source
SQL
SELECT COUNT(*) AS total, SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) AS subscribed, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv
Data
TotalSubscribedSubscription Rate
41,1884,64011.27
1 row
This dataset has become a benchmark for binary classification problems in machine learning, particularly for predicting customer response to marketing campaigns. It's widely used in academic research, banking analytics, and as a teaching resource for data science courses.

Historical and Economic Context

This dataset was collected between May 2008 and November 2010, a period that coincided with one of the most significant financial crises in modern history. Portugal, as part of the Eurozone, experienced severe economic turbulence during this time, with rising unemployment, declining consumer confidence, and increasing pressure on the banking sector.
The Portuguese banking industry faced unique challenges during this period. Term deposits—fixed-term savings accounts offering guaranteed interest rates—became an attractive product for banks seeking to stabilize their funding base. For consumers facing economic uncertainty, these low-risk savings vehicles offered a safe haven compared to volatile investment alternatives. This context helps explain why the bank invested heavily in telemarketing campaigns despite the challenging economic environment.
The dataset's collection period (2008-2010) coincides with the global financial crisis. Economic conditions and consumer behavior patterns may differ significantly in other time periods, which should be considered when applying models trained on this data.

Dataset Composition

The dataset features a rich combination of client demographics, previous campaign interactions, and macroeconomic indicators that provide context for each marketing contact. Understanding the composition helps reveal who the bank was targeting and how those segments responded.
View Source
SQL
SELECT job, COUNT(*) AS count FROM finance.csv GROUP BY job ORDER BY count DESC
Data
JobCount
Admin10,422
Blue-collar9,254
Technician6,743
Services3,969
Management2,924
Retired1,720
Entrepreneur1,456
Self-employed1,421
Housemaid1,060
Unemployed1,014
Student875
Unknown330
12 rows
Administrative workers and blue-collar employees dominate the dataset, representing over 47% of all contacts combined. This reflects both the demographic composition of Portugal's workforce and the bank's targeting strategy—focusing on employed individuals with stable income sources who might have disposable income for savings.
Client ages range from 17 to 98 years with an average age of 40 years.
View Source
SQL
SELECT MIN(age) AS min_age, MAX(age) AS max_age, ROUND(AVG(age)) AS avg_age FROM finance.csv
Data
Min AgeMax AgeAvg Age
179840
1 row
View Source
SQL
SELECT education, COUNT(*) AS count FROM finance.csv GROUP BY education ORDER BY count DESC
Data
EducationCount
University Degree12,168
High School9,515
Basic 9 Years6,045
Professional Course5,243
Basic 4 Years4,176
Basic 6 Years2,292
Unknown1,731
Illiterate18
8 rows
The education distribution shows that nearly 30% of contacts hold university degrees, while a significant portion completed only basic education (4, 6, or 9 years). The presence of 1,731 contacts with unknown education status and 18 illiterate individuals highlights data quality considerations that analysts should account for.

Key Insights: Who Subscribes?

Analysis reveals striking patterns in subscription behavior across different client segments. These insights have significant implications for marketing strategy and customer targeting.
Students and retired individuals show the highest subscription rates at 31.43% and 25.23% respectively—nearly 3x the overall average of 11.27%.
View Source
SQL
SELECT job, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY job ORDER BY subscription_rate DESC
Data
JobSubscription Rate (%)
Student31.43
Retired25.23
Unemployed14.2
Admin12.97
Management11.22
Technician10.83
Self-employed10.49
Housemaid10
Entrepreneur8.52
Services8.14
Blue-collar6.89
11 rows
The high subscription rates among students and retirees likely reflect different motivations. Students may be establishing savings habits with parental guidance or managing educational funds, while retirees prioritize capital preservation and guaranteed returns over riskier investments—especially relevant during economic uncertainty.
View Source
SQL
SELECT CASE WHEN age < 30 THEN '18-29' WHEN age < 40 THEN '30-39' WHEN age < 50 THEN '40-49' WHEN age < 60 THEN '50-59' ELSE '60+' END AS age_group, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY age_group ORDER BY subscription_rate DESC
Data
Age GroupSubscription Rate (%)
60+39.56
18-2916.26
50-5910.16
30-3910.13
40-497.92
5 rows
Clients aged 60 and above show the highest subscription rate at 39.56%, nearly 5x higher than the 40-49 age group at 7.92%. This pattern suggests that retirement planning and financial security become more appealing to older demographics, while middle-aged clients—often managing mortgages, children's education, and other expenses—have less disposable income for term deposits.

Temporal Patterns in Campaign Success

Campaign timing significantly impacts success rates. The data reveals strong seasonal patterns that have important implications for marketing resource allocation.
View Source
SQL
SELECT MONTH, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY MONTH ORDER BY subscription_rate DESC
Data
MonthSubscription Rate (%)
March50.55
December48.9
September44.91
October43.87
April20.48
August10.6
June10.51
November10.14
July9.05
May6.43
10 rows
March campaigns achieved the highest success rate at 50.55%, while May had the lowest at 6.43% despite having the most contacts (13,769).
View Source
SQL
SELECT MONTH, COUNT(*) AS contacts, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY MONTH ORDER BY subscription_rate DESC
Data
MonthContactsSubscription Rate
mar54650.55
may13,7696.43
2 rows
The inverse relationship between contact volume and success rate is striking. High-volume months (May, July, August) show the lowest conversion rates, suggesting diminishing returns from aggressive outreach. The high success rates in March, December, September, and October may reflect fiscal year-end considerations, bonus season, or simply better-qualified leads during lower-volume periods.

Contact Method and Campaign History

The method of contact and prior campaign outcomes are among the strongest predictors of success in this dataset.
View Source
SQL
SELECT contact, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY contact ORDER BY subscription_rate DESC
Data
Contact TypeSubscription Rate (%)
Cellular14.74
Telephone5.23
2 rows
Cellular contacts achieve 14.74% subscription rate—nearly 3x higher than telephone contacts at 5.23%.
View Source
SQL
SELECT contact, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY contact
Data
ContactSubscription Rate
cellular14.74
telephone5.23
2 rows
The dramatic difference between cellular and landline telephone success rates likely reflects demographic factors—cellular users tend to be younger, more tech-savvy, and more accessible. Landline users may include older demographics who are harder to reach or more resistant to telemarketing.
View Source
SQL
SELECT poutcome, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY poutcome ORDER BY subscription_rate DESC
Data
Previous OutcomeSubscription Rate (%)
Success65.11
Failure14.23
Nonexistent8.83
3 rows
Clients with a successful previous campaign outcome have a 65.11% subscription rate—the single strongest predictor in the dataset and nearly 6x the overall average.
The power of previous success as a predictor underscores the importance of customer relationship management. Satisfied previous customers are dramatically more likely to engage again, suggesting that retention and re-engagement strategies may yield better ROI than cold outreach.

Macroeconomic Indicators

A distinguishing feature of this dataset is the inclusion of five social and economic context attributes that capture the macroeconomic environment during each contact. These features are particularly valuable for understanding how external economic conditions influence consumer financial decisions.
Economic Indicators Summary
IndicatorAverage ValueDescription
Employment Variation Rate0.08%Quarterly employment change
Consumer Price Index93.58Monthly consumer price indicator
Consumer Confidence Index-40.50Monthly consumer confidence
Euribor 3-Month Rate3.62%Euro interbank offered rate
Number Employed5167.04Quarterly employment figures (thousands)
5 rows
View Source
SQL
SELECT ROUND(AVG("emp.var.rate"), 2), ROUND(AVG("cons.price.idx"), 2), ROUND(AVG("cons.conf.idx"), 2), ROUND(AVG(euribor3m), 2), ROUND(AVG(nr_employed), 2) FROM finance.csv
Data
IndicatorAverage ValueDescription
Employment Variation Rate0.08%Quarterly employment change
Consumer Price Index93.58Monthly consumer price indicator
Consumer Confidence Index-40.50Monthly consumer confidence
Euribor 3-Month Rate3.62%Euro interbank offered rate
Number Employed5167.04Quarterly employment figures (thousands)
5 rows
The negative consumer confidence index (average -40.50) reflects the pessimistic economic sentiment during the financial crisis. Research has shown that these macroeconomic features significantly improve prediction accuracy, as they capture systemic factors that influence consumer behavior beyond individual characteristics.

Data Quality Considerations

When working with this dataset, analysts should be aware of several data quality considerations that may affect modeling decisions.
The 'duration' feature should be excluded when building realistic predictive models. Call duration is only known after the call ends—at which point the outcome is also known. Including it constitutes data leakage and produces misleadingly optimistic model performance.
Several categorical fields contain 'unknown' values that require handling decisions: job (330 unknown), education (1,731 unknown), marital status, and loan status. The 'pdays' feature uses 999 as a sentinel value to indicate clients who were not previously contacted, which should be treated as a categorical indicator rather than a numeric value.
With only 11.27% positive cases, this dataset exhibits moderate class imbalance. Models trained without addressing this imbalance may achieve high accuracy by simply predicting the majority class, while failing to identify the valuable minority of subscribers. Appropriate evaluation metrics include F1-score, precision-recall AUC, and Matthews correlation coefficient rather than accuracy alone.

Academic Impact and Research Context

The data was originally compiled by researchers at the University of Minho in Portugal and published in the journal Decision Support Systems in 2014. Since its release, it has become one of the most cited datasets in marketing analytics and machine learning research, appearing in hundreds of academic papers.
The dataset serves as a standard benchmark for classification algorithms, feature engineering techniques, and methods for handling class imbalance. Its combination of categorical and numeric features, along with the included macroeconomic indicators, makes it particularly valuable for teaching and demonstrating real-world data science challenges.

Sample Data Preview

Sample Records from the Dataset
#AgeJobMaritalEducationContactMonthDurationCampaignSubscribed
156HousemaidMarriedBasic 4yTelephoneMay2611No
257ServicesMarriedHigh SchoolTelephoneMay1491No
337ServicesMarriedHigh SchoolTelephoneMay2261No
440AdminMarriedBasic 6yTelephoneMay1511No
556ServicesMarriedHigh SchoolTelephoneMay3071No
5 rows
View Source
SQL
SELECT age, job, marital, education, contact, MONTH, duration, campaign, y FROM finance.csv LIMIT 5
Data
AgeJobMaritalEducationContactMonthDurationCampaignSubscribed
56HousemaidMarriedBasic 4yTelephoneMay2611No
57ServicesMarriedHigh SchoolTelephoneMay1491No
37ServicesMarriedHigh SchoolTelephoneMay2261No
40AdminMarriedBasic 6yTelephoneMay1511No
56ServicesMarriedHigh SchoolTelephoneMay3071No
5 rows

Important Modeling Considerations

For realistic predictive models, exclude the 'duration' feature. Models built with duration will show inflated performance metrics that won't translate to real-world deployment where call duration is unknown at prediction time.
The 'pdays' value of 999 indicates no previous contact. Treat this as a categorical indicator (contacted vs. not contacted) rather than as a numeric distance measure.

Table Overview

finance

Contains 41,188 rows and 21 columns. Column types: 10 numeric, 10 text, 1 boolean.

41,188 rows21 columns

finance

41,188
rows
21
columns

Data Preview

Scroll to see more
Row 1
age56
jobhousemaid
maritalmarried
educationbasic.4y
defaultno
+16 more columns
Row 2
age57
jobservices
maritalmarried
educationhigh.school
defaultunknown
+16 more columns
Row 3
age37
jobservices
maritalmarried
educationhigh.school
defaultno
+16 more columns

Data Profile

41,188
rows
21
columns
100%
complete
41.2 MB
estimated size

Column Types

10 Numeric10 Text1 Boolean

Data Dictionary

finance

ColumnTypeExampleMissing Values
agenumeric56, 570
jobstring"housemaid", "services"0
maritalstring"married", "married"0
educationstring"basic.4y", "high.school"0
defaultstring"no", "unknown"0
housingstring"no", "no"0
loanstring"no", "no"0
contactstring"telephone", "telephone"0
monthstring"may", "may"0
day_of_weekstring"mon", "mon"0
durationnumeric261, 1490
campaignnumeric1, 10
pdaysnumeric999, 9990
previousnumeric0, 00
poutcomestring"nonexistent", "nonexistent"0
emp.var.ratenumeric1.1, 1.10
cons.price.idxnumeric93.994, 93.9940
cons.conf.idxnumeric-36.4, -36.40
euribor3mnumeric4.857, 4.8570
nr.employednumeric5191, 51910
ybooleanfalse, false0
Last updated: January 2, 2026
Created: January 2, 2026