What is the Bank Marketing Dataset dataset?

41,188 Portuguese bank marketing contacts with 21 features including demographics, campaign data, and economic indicators. 11.27% subscription rate for term deposits.

How can I download the Bank Marketing Dataset dataset?

You can download the Bank Marketing Dataset dataset in CSV or Parquet format directly from this page. Each table has its own download buttons.

What format is the Bank Marketing Dataset dataset available in?

This dataset is available in CSV and Parquet formats. CSV is great for spreadsheet applications, while Parquet is optimized for data analysis tools like Pandas and DuckDB.

What is the license for the Bank Marketing Dataset dataset?

This dataset is available under the CC BY 4.0 license. See the full license at https://creativecommons.org/licenses/by/4.0/

How many tables are in the Bank Marketing Dataset dataset?

The Bank Marketing Dataset dataset contains 1 table: finance.

Bank Marketing Dataset

Last updated 2 months ago•January 2, 2026

Source:UCI Machine Learning Repository

License:CC BY 4.0

Version:1.0

Time:May 2008 - November 2010

Location:Portugal

Created by Dataset Agent

Overview

The Bank Marketing Dataset is a comprehensive collection of direct marketing campaign data from a Portuguese banking institution. This dataset captures phone-based marketing efforts aimed at convincing clients to subscribe to term deposit products. With 41,188 client contacts and 21 distinct features, it provides rich insights into customer behavior, campaign effectiveness, and the socioeconomic factors that influence financial decision-making.

The dataset contains 41,188 client contacts from Portuguese bank marketing campaigns.

View Source

SQL

SELECT COUNT(*) AS total FROM finance.csv

Data

Total
41,188
1 row

Overall subscription rate of 11.27% with 4,640 successful term deposit conversions.

View Source

SQL

SELECT COUNT(*) AS total, SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) AS subscribed, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv

Data

Total	Subscribed	Subscription Rate
41,188	4,640	11.27
1 row

This dataset has become a benchmark for binary classification problems in machine learning, particularly for predicting customer response to marketing campaigns. It's widely used in academic research, banking analytics, and as a teaching resource for data science courses.

Historical and Economic Context

This dataset was collected between May 2008 and November 2010, a period that coincided with one of the most significant financial crises in modern history. Portugal, as part of the Eurozone, experienced severe economic turbulence during this time, with rising unemployment, declining consumer confidence, and increasing pressure on the banking sector.

The Portuguese banking industry faced unique challenges during this period. Term deposits—fixed-term savings accounts offering guaranteed interest rates—became an attractive product for banks seeking to stabilize their funding base. For consumers facing economic uncertainty, these low-risk savings vehicles offered a safe haven compared to volatile investment alternatives. This context helps explain why the bank invested heavily in telemarketing campaigns despite the challenging economic environment.

The dataset's collection period (2008-2010) coincides with the global financial crisis. Economic conditions and consumer behavior patterns may differ significantly in other time periods, which should be considered when applying models trained on this data.

Dataset Composition

The dataset features a rich combination of client demographics, previous campaign interactions, and macroeconomic indicators that provide context for each marketing contact. Understanding the composition helps reveal who the bank was targeting and how those segments responded.

View Source

SQL

SELECT job, COUNT(*) AS count FROM finance.csv GROUP BY job ORDER BY count DESC

Data

Job	Count
Admin	10,422
Blue-collar	9,254
Technician	6,743
Services	3,969
Management	2,924
Retired	1,720
Entrepreneur	1,456
Self-employed	1,421
Housemaid	1,060
Unemployed	1,014
Student	875
Unknown	330
12 rows

Administrative workers and blue-collar employees dominate the dataset, representing over 47% of all contacts combined. This reflects both the demographic composition of Portugal's workforce and the bank's targeting strategy—focusing on employed individuals with stable income sources who might have disposable income for savings.

Client ages range from 17 to 98 years with an average age of 40 years.

View Source

SQL

SELECT MIN(age) AS min_age, MAX(age) AS max_age, ROUND(AVG(age)) AS avg_age FROM finance.csv

Data

Min Age	Max Age	Avg Age
17	98	40
1 row

View Source

SQL

SELECT education, COUNT(*) AS count FROM finance.csv GROUP BY education ORDER BY count DESC

Data

Education	Count
University Degree	12,168
High School	9,515
Basic 9 Years	6,045
Professional Course	5,243
Basic 4 Years	4,176
Basic 6 Years	2,292
Unknown	1,731
Illiterate	18
8 rows

The education distribution shows that nearly 30% of contacts hold university degrees, while a significant portion completed only basic education (4, 6, or 9 years). The presence of 1,731 contacts with unknown education status and 18 illiterate individuals highlights data quality considerations that analysts should account for.

Key Insights: Who Subscribes?

Analysis reveals striking patterns in subscription behavior across different client segments. These insights have significant implications for marketing strategy and customer targeting.

Students and retired individuals show the highest subscription rates at 31.43% and 25.23% respectively—nearly 3x the overall average of 11.27%.

View Source

SQL

SELECT job, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY job ORDER BY subscription_rate DESC

Data

Job	Subscription Rate (%)
Student	31.43
Retired	25.23
Unemployed	14.2
Admin	12.97
Management	11.22
Technician	10.83
Self-employed	10.49
Housemaid	10
Entrepreneur	8.52
Services	8.14
Blue-collar	6.89
11 rows

The high subscription rates among students and retirees likely reflect different motivations. Students may be establishing savings habits with parental guidance or managing educational funds, while retirees prioritize capital preservation and guaranteed returns over riskier investments—especially relevant during economic uncertainty.

View Source

SQL

SELECT CASE WHEN age < 30 THEN '18-29' WHEN age < 40 THEN '30-39' WHEN age < 50 THEN '40-49' WHEN age < 60 THEN '50-59' ELSE '60+' END AS age_group, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY age_group ORDER BY subscription_rate DESC

Data

Age Group	Subscription Rate (%)
60+	39.56
18-29	16.26
50-59	10.16
30-39	10.13
40-49	7.92
5 rows

Clients aged 60 and above show the highest subscription rate at 39.56%, nearly 5x higher than the 40-49 age group at 7.92%. This pattern suggests that retirement planning and financial security become more appealing to older demographics, while middle-aged clients—often managing mortgages, children's education, and other expenses—have less disposable income for term deposits.

Temporal Patterns in Campaign Success

Campaign timing significantly impacts success rates. The data reveals strong seasonal patterns that have important implications for marketing resource allocation.

View Source

SQL

SELECT MONTH, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY MONTH ORDER BY subscription_rate DESC

Data

Month	Subscription Rate (%)
March	50.55
December	48.9
September	44.91
October	43.87
April	20.48
August	10.6
June	10.51
November	10.14
July	9.05
May	6.43
10 rows

March campaigns achieved the highest success rate at 50.55%, while May had the lowest at 6.43% despite having the most contacts (13,769).

View Source

SQL

SELECT MONTH, COUNT(*) AS contacts, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY MONTH ORDER BY subscription_rate DESC

Data

Month	Contacts	Subscription Rate
mar	546	50.55
may	13,769	6.43
2 rows

The inverse relationship between contact volume and success rate is striking. High-volume months (May, July, August) show the lowest conversion rates, suggesting diminishing returns from aggressive outreach. The high success rates in March, December, September, and October may reflect fiscal year-end considerations, bonus season, or simply better-qualified leads during lower-volume periods.

Contact Method and Campaign History

The method of contact and prior campaign outcomes are among the strongest predictors of success in this dataset.

View Source

SQL

SELECT contact, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY contact ORDER BY subscription_rate DESC

Data

Contact Type	Subscription Rate (%)
Cellular	14.74
Telephone	5.23
2 rows

Cellular contacts achieve 14.74% subscription rate—nearly 3x higher than telephone contacts at 5.23%.

View Source

SQL

SELECT contact, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY contact

Data

Contact	Subscription Rate
cellular	14.74
telephone	5.23
2 rows

The dramatic difference between cellular and landline telephone success rates likely reflects demographic factors—cellular users tend to be younger, more tech-savvy, and more accessible. Landline users may include older demographics who are harder to reach or more resistant to telemarketing.

View Source

SQL

SELECT poutcome, ROUND( SUM( CASE WHEN y = TRUE THEN 1 ELSE 0 END ) * 100.0 / COUNT(*), 2 ) AS subscription_rate FROM finance.csv GROUP BY poutcome ORDER BY subscription_rate DESC

Data

Previous Outcome	Subscription Rate (%)
Success	65.11
Failure	14.23
Nonexistent	8.83
3 rows

Clients with a successful previous campaign outcome have a 65.11% subscription rate—the single strongest predictor in the dataset and nearly 6x the overall average.

The power of previous success as a predictor underscores the importance of customer relationship management. Satisfied previous customers are dramatically more likely to engage again, suggesting that retention and re-engagement strategies may yield better ROI than cold outreach.

Macroeconomic Indicators

A distinguishing feature of this dataset is the inclusion of five social and economic context attributes that capture the macroeconomic environment during each contact. These features are particularly valuable for understanding how external economic conditions influence consumer financial decisions.

Economic Indicators Summary

Indicator	Average Value	Description
Employment Variation Rate	0.08%	Quarterly employment change
Consumer Price Index	93.58	Monthly consumer price indicator
Consumer Confidence Index	-40.50	Monthly consumer confidence
Euribor 3-Month Rate	3.62%	Euro interbank offered rate
Number Employed	5167.04	Quarterly employment figures (thousands)
5 rows

View Source

SQL

SELECT ROUND(AVG("emp.var.rate"), 2), ROUND(AVG("cons.price.idx"), 2), ROUND(AVG("cons.conf.idx"), 2), ROUND(AVG(euribor3m), 2), ROUND(AVG(nr_employed), 2) FROM finance.csv

Data

Indicator	Average Value	Description
Employment Variation Rate	0.08%	Quarterly employment change
Consumer Price Index	93.58	Monthly consumer price indicator
Consumer Confidence Index	-40.50	Monthly consumer confidence
Euribor 3-Month Rate	3.62%	Euro interbank offered rate
Number Employed	5167.04	Quarterly employment figures (thousands)
5 rows

The negative consumer confidence index (average -40.50) reflects the pessimistic economic sentiment during the financial crisis. Research has shown that these macroeconomic features significantly improve prediction accuracy, as they capture systemic factors that influence consumer behavior beyond individual characteristics.

Data Quality Considerations

When working with this dataset, analysts should be aware of several data quality considerations that may affect modeling decisions.

The 'duration' feature should be excluded when building realistic predictive models. Call duration is only known after the call ends—at which point the outcome is also known. Including it constitutes data leakage and produces misleadingly optimistic model performance.

Several categorical fields contain 'unknown' values that require handling decisions: job (330 unknown), education (1,731 unknown), marital status, and loan status. The 'pdays' feature uses 999 as a sentinel value to indicate clients who were not previously contacted, which should be treated as a categorical indicator rather than a numeric value.

With only 11.27% positive cases, this dataset exhibits moderate class imbalance. Models trained without addressing this imbalance may achieve high accuracy by simply predicting the majority class, while failing to identify the valuable minority of subscribers. Appropriate evaluation metrics include F1-score, precision-recall AUC, and Matthews correlation coefficient rather than accuracy alone.

Academic Impact and Research Context

The data was originally compiled by researchers at the University of Minho in Portugal and published in the journal Decision Support Systems in 2014. Since its release, it has become one of the most cited datasets in marketing analytics and machine learning research, appearing in hundreds of academic papers.

The dataset serves as a standard benchmark for classification algorithms, feature engineering techniques, and methods for handling class imbalance. Its combination of categorical and numeric features, along with the included macroeconomic indicators, makes it particularly valuable for teaching and demonstrating real-world data science challenges.

Sample Data Preview

Sample Records from the Dataset

#	Age	Job	Marital	Education	Contact	Month	Duration	Campaign	Subscribed
1	56	Housemaid	Married	Basic 4y	Telephone	May	261	1	No
2	57	Services	Married	High School	Telephone	May	149	1	No
3	37	Services	Married	High School	Telephone	May	226	1	No
4	40	Admin	Married	Basic 6y	Telephone	May	151	1	No
5	56	Services	Married	High School	Telephone	May	307	1	No
5 rows

View Source

SQL

SELECT age, job, marital, education, contact, MONTH, duration, campaign, y FROM finance.csv LIMIT 5

Data

Age	Job	Marital	Education	Contact	Month	Duration	Campaign	Subscribed
56	Housemaid	Married	Basic 4y	Telephone	May	261	1	No
57	Services	Married	High School	Telephone	May	149	1	No
37	Services	Married	High School	Telephone	May	226	1	No
40	Admin	Married	Basic 6y	Telephone	May	151	1	No
56	Services	Married	High School	Telephone	May	307	1	No
5 rows

Important Modeling Considerations

For realistic predictive models, exclude the 'duration' feature. Models built with duration will show inflated performance metrics that won't translate to real-world deployment where call duration is unknown at prediction time.

The 'pdays' value of 999 indicates no previous contact. Treat this as a categorical indicator (contacted vs. not contacted) rather than as a numeric distance measure.

Table Overview

finance

Contains 41,188 rows and 21 columns. Column types: 10 numeric, 10 text, 1 boolean.

41,188 rows21 columns

finance

41,188

rows

columns

Data Preview

Scroll to see more

age	job	marital	education	default	housing	loan	contact	month	day_of_week	duration	campaign	pdays	poutcome	emp.var.rate	cons.price.idx	cons.conf.idx	euribor3m	nr.employed	y
56	housemaid	married	basic.4y	no	no	no	telephone	may	mon	261	1	999	nonexistent	1.1	93.99	-36.4	4.86	5,191	false
57	services	married	high.school	unknown	no	no	telephone	may	mon	149	1	999	nonexistent	1.1	93.99	-36.4	4.86	5,191	false
37	services	married	high.school	no	yes	no	telephone	may	mon	226	1	999	nonexistent	1.1	93.99	-36.4	4.86	5,191	false
40	admin.	married	basic.6y	no	no	no	telephone	may	mon	151	1	999	nonexistent	1.1	93.99	-36.4	4.86	5,191	false
56	services	married	high.school	no	no	yes	telephone	may	mon	307	1	999	nonexistent	1.1	93.99	-36.4	4.86	5,191	false

Row 1

age56

jobhousemaid

maritalmarried

educationbasic.4y

defaultno

+16 more columns

Row 2

age57

jobservices

maritalmarried

educationhigh.school

defaultunknown

+16 more columns

Row 3

age37

jobservices

maritalmarried

educationhigh.school

defaultno

+16 more columns

Showing 5 of 41,188 rows

Data Profile

41,188

rows

columns

100%

complete

41.2 MB

estimated size

Column Types

10 Numeric10 Text1 Boolean

Data Dictionary

finance

Column	Type	Description	Example
`age`	numeric	Age in years (37 to 57)	56, 57
`job`	string	Job	"housemaid", "services"
`marital`	string	Marital	"married", "married"
`education`	string	Education	"basic.4y", "high.school"
`default`	string	Default	"no", "unknown"
`housing`	string	Housing	"no", "no"
`loan`	string	Loan	"no", "no"
`contact`	string	Contact	"telephone", "telephone"
`month`	string	Month	"may", "may"
`day_of_week`	string	Day Of Week	"mon", "mon"
`duration`	numeric	Numeric value (range: 149 - 307)	261, 149
`campaign`	numeric	Numeric value (range: 1 - 1)	1, 1
`pdays`	numeric	Numeric value (range: 999 - 999)	999, 999
`previous`	numeric	Numeric value (range: 0 - 0)	0, 0
`poutcome`	string	Poutcome	"nonexistent", "nonexistent"
`emp.var.rate`	numeric	Numeric value (range: 1.1 - 1.1)	1.1, 1.1
`cons.price.idx`	numeric	Monetary value	93.994, 93.994
`cons.conf.idx`	numeric	Numeric value (range: -36.4 - -36.4)	-36.4, -36.4
`euribor3m`	numeric	Numeric value (range: 4.86 - 4.86)	4.857, 4.857
`nr.employed`	numeric	Numeric value (range: 5,191 - 5,191)	5191, 5191
`y`	boolean	Boolean flag	false, false