Agents for Data
Skip to main content

Wine Dataset

Classic UCI Wine Dataset with 178 samples from 3 Italian cultivars, featuring 13 chemical attributes for multi-class classification, clustering, and dimensionality reduction benchmarking.

classificationmachine-learningwine-chemistrymultivariate-analysisuci-repositorybenchmark-datasetdimensionality-reductionclusteringfeature-scalingitalian-winecultivar-classificationscikit-learnbeginner-friendly1 table178 rows
Last updated 1 weeks agoJanuary 2, 2026
Time:1980s
Location:Piedmont, Italy
Created by Dataset Agent

Overview

The Wine Dataset is one of the most iconic benchmark datasets in machine learning, originating from chemical analysis of wines grown in the Piedmont region of Italy. The wines derive from three different cultivars (grape varieties), and the classification task involves determining which cultivar a wine belongs to based solely on its chemical properties. Created by M. Forina and colleagues at the Institute of Pharmaceutical and Food Analysis and Technologies in Genoa, Italy, this dataset was donated to the UCI Machine Learning Repository in 1991 by Stefan Aeberhard.
The dataset contains 178 wine samples with complete chemical analysis profiles across 13 continuous attributes and 3 cultivar classes. <details> <summary>View SQL Query</summary> ``sql SELECT COUNT(*) as total_samples, COUNT(DISTINCT class) as num_classes FROM wine.csv `` </details>
View Source
SQL
SELECT COUNT(*) AS row_count FROM wine.csv
Data
Row Count
178
1 row
Zero Missing Values: Unlike many real-world datasets, the Wine Dataset is complete with no missing values across all 178 samples and 14 columns. This makes it ideal for learning ML fundamentals without requiring imputation techniques.

The 13 Chemical Attributes Explained

Understanding what each chemical attribute measures is crucial for interpreting model results and feature importance. These attributes capture the essential chemical fingerprint that distinguishes wines from different cultivars.

Class Distribution Analysis

The three wine cultivars show moderate class imbalance, which is important to consider when training classifiers and evaluating model performance.
View Source
SQL
SELECT class, COUNT(*) AS count FROM wine.csv GROUP BY class ORDER BY class
Data
Cultivar ClassSample CountPercentage
Class 15933.1%
Class 27139.9%
Class 34827.0%
3 rows
Class 2 dominates with 71 samples (39.9%), while Class 3 has the fewest with 48 samples (27.0%). This ~1.5:1 imbalance ratio is mild but should be considered when using accuracy as an evaluation metric. <details> <summary>View SQL Query</summary> ``sql SELECT class, COUNT(*) as count, ROUND(COUNT(*) * 100.0 / 178, 1) as percentage FROM wine.csv GROUP BY class ORDER BY class `` </details>
View Source
SQL
SELECT class, COUNT(*) AS count FROM wine.csv GROUP BY class ORDER BY class
Data
ClassCount
159
271
348
3 rows

Feature Statistics Summary

The 13 features span vastly different scales, making feature scaling (standardization or normalization) essential for distance-based algorithms like k-NN and SVM.
Descriptive Statistics for All Features
FeatureMinMaxMeanStd DevScale Factor
Alcohol11.0314.8313.000.811x
Malic Acid0.745.802.341.121x
Ash1.363.232.370.271x
Alcalinity of Ash10.6030.0019.493.3410x
Magnesium7016299.7414.28100x
Total Phenols0.983.882.290.631x
Flavanoids0.345.082.031.001x
Nonflavanoid Phenols0.130.660.360.120.1x
Proanthocyanins0.413.581.590.571x
Color Intensity1.2813.005.062.3210x
Hue0.481.710.960.231x
OD280/OD3151.274.002.610.711x
Proline2781680746.89314.911000x
13 rows
View Source
SQL
SELECT ROUND(MIN(alcohol), 2), ROUND(MAX(alcohol), 2), ROUND(AVG(alcohol), 2), ROUND(STDDEV (alcohol), 2) FROM wine.csv
Data
FeatureMinMaxMeanStd DevScale Factor
Alcohol11.0314.8313.000.811x
Malic Acid0.745.802.341.121x
Ash1.363.232.370.271x
Alcalinity of Ash10.6030.0019.493.3410x
Magnesium7016299.7414.28100x
Total Phenols0.983.882.290.631x
Flavanoids0.345.082.031.001x
Nonflavanoid Phenols0.130.660.360.120.1x
Proanthocyanins0.413.581.590.571x
Color Intensity1.2813.005.062.3210x
Hue0.481.710.960.231x
OD280/OD3151.274.002.610.711x
Proline2781680746.89314.911000x
13 rows
Proline shows the largest variation with values ranging from 278 to 1,680 mg/L (6x range), while Ash is the most consistent feature with only a 1.87 range (1.36 to 3.23). <details> <summary>View SQL Query</summary> ``sql SELECT ROUND(MIN(proline), 0) as proline_min, ROUND(MAX(proline), 0) as proline_max, ROUND(MIN(ash), 2) as ash_min, ROUND(MAX(ash), 2) as ash_max FROM wine.csv `` </details>
View Source
SQL
SELECT MIN(proline), MAX(proline), MIN(ash), MAX(ash) FROM wine.csv
Data
Min ProlineMax ProlineMin AshMax Ash
2781,6801.363.23
1 row

Distinguishing Chemical Signatures by Cultivar

Each cultivar exhibits distinct chemical profiles that enable high classification accuracy. These differences reflect unique terroir characteristics and grape variety genetics.
Average Feature Values by Cultivar Class
FeatureClass 1Class 2Class 3Best Discriminator?
Alcohol (%)13.7412.2813.15✓ Class 2 lowest
Flavanoids2.982.080.78✓✓ Strong separator
Color Intensity5.533.097.40✓✓ Strong separator
Proline (mg/L)1116520630✓✓ Class 1 highest
Hue1.061.060.68✓ Class 3 lowest
OD280/OD3153.162.781.68✓ Class 3 lowest
6 rows
View Source
SQL
SELECT class, ROUND(AVG(alcohol), 2), ROUND(AVG(flavanoids), 2), ROUND(AVG(color_intensity), 2), ROUND(AVG(proline), 0) FROM wine.csv GROUP BY class
Data
FeatureClass 1Class 2Class 3Best Discriminator?
Alcohol (%)13.7412.2813.15✓ Class 2 lowest
Flavanoids2.982.080.78✓✓ Strong separator
Color Intensity5.533.097.40✓✓ Strong separator
Proline (mg/L)1116520630✓✓ Class 1 highest
Hue1.061.060.68✓ Class 3 lowest
OD280/OD3153.162.781.68✓ Class 3 lowest
6 rows
Flavanoids is the strongest discriminating feature: Class 1 averages 2.98, Class 2 averages 2.08, and Class 3 averages only 0.78 — nearly a 4x difference between Class 1 and Class 3. <details> <summary>View SQL Query</summary> ``sql SELECT class, ROUND(AVG(flavanoids), 2) as avg_flavanoids FROM wine.csv GROUP BY class ORDER BY class `` </details>
View Source
SQL
SELECT class, ROUND(AVG(flavanoids), 2) AS avg_flavanoids FROM wine.csv GROUP BY class ORDER BY class
Data
ClassAvg Flavanoids
12.98
22.08
30.78
3 rows
View Source
SQL
SELECT class, ROUND(AVG(flavanoids), 2) AS avg_flavanoids FROM wine.csv GROUP BY class ORDER BY class
Data
CultivarAvg Flavanoids
Class 12.98
Class 22.08
Class 30.78
3 rows

Feature Correlations and Multicollinearity

Several features are highly correlated, which affects feature selection strategies and can cause issues with certain algorithms. Understanding these relationships guides effective dimensionality reduction.
Flavanoids and Total Phenols show the strongest positive correlation at r = 0.865, indicating these phenolic compounds are chemically related. This high correlation suggests one could be dropped without significant information loss. <details> <summary>View SQL Query</summary> ``sql SELECT ROUND(CORR(flavanoids, total_phenols), 3) as correlation FROM wine.csv `` </details>
View Source
SQL
SELECT ROUND(CORR(flavanoids, total_phenols), 3) AS flavanoids_phenols_corr FROM wine.csv
Data
Flavanoids Phenols Corr
0.87
1 row
Notable Feature Correlations
Feature PairCorrelationInterpretation
Flavanoids ↔ Total Phenols+0.865Strong positive; redundant information
Flavanoids ↔ OD280/OD315+0.787Both relate to phenolic content
Alcohol ↔ Proline+0.644Moderate positive; ripeness indicators
Hue ↔ Malic Acid+0.561Color relates to acid profile
Color Intensity ↔ Flavanoids-0.172Weak negative; interesting inverse
5 rows
View Source
SQL
SELECT ROUND(CORR(alcohol, proline), 3), ROUND(CORR(flavanoids, total_phenols), 3) FROM wine.csv
Data
Feature PairCorrelationInterpretation
Flavanoids ↔ Total Phenols+0.865Strong positive; redundant information
Flavanoids ↔ OD280/OD315+0.787Both relate to phenolic content
Alcohol ↔ Proline+0.644Moderate positive; ripeness indicators
Hue ↔ Malic Acid+0.561Color relates to acid profile
Color Intensity ↔ Flavanoids-0.172Weak negative; interesting inverse
5 rows

Sample Data Preview

Here are representative samples from each cultivar class showing actual data values. Note how the chemical profiles differ systematically between classes.
Sample Wine Records (Selected Features)
ClassAlcoholMalic AcidFlavanoidsColor IntensityProline
114.231.713.065.641,065
113.21.782.764.381,050
212.370.942.763520
212.331.12.432.2680
312.861.350.647.65720
313.111.010.7510.8630
6 rows
View Source
SQL
SELECT class, alcohol, malic_acid, flavanoids, color_intensity, proline FROM wine.csv WHERE class = 1 LIMIT 2 UNION ALL SELECT class, alcohol, malic_acid, flavanoids, color_intensity, proline FROM wine.csv WHERE class = 2 LIMIT 2 UNION ALL SELECT class, alcohol, malic_acid, flavanoids, color_intensity, proline FROM wine.csv WHERE class = 3 LIMIT 2
Data
ClassAlcoholMalic AcidFlavanoidsColor IntensityProline
114.231.713.065.641,065
113.21.782.764.381,050
212.370.942.763520
212.331.12.432.2680
312.861.350.647.65720
313.111.010.7510.8630
6 rows

Common Confusion: Wine vs. Wine Quality Dataset

Don't confuse these two popular datasets! The UCI Wine Dataset (this one) and the Wine Quality Dataset are frequently mixed up but serve different purposes.

Expected Classification Performance

Due to well-separated class boundaries, this dataset achieves high accuracy with most classifiers. Here are typical benchmark results:
The high achievable accuracy (95-100%) makes this dataset excellent for validating implementations and learning ML workflows, but less suitable for benchmarking state-of-the-art algorithms where performance differences would be negligible.

Historical Context and Provenance

The Wine Dataset originates from research conducted in the 1980s at the Institute of Pharmaceutical and Food Analysis and Technologies in Genoa, Italy. The wines analyzed came from the Piedmont region, one of Italy's most prestigious wine-producing areas, known for varieties like Barolo, Barbaresco, and Barbera. While the specific cultivar names are not disclosed in the dataset, they represent three distinct grape varieties grown in the same geographical region.
Stefan Aeberhard donated the dataset to the UCI Machine Learning Repository in 1991, where it has since been cited in thousands of academic papers. The original research was published in Aeberhard, S., Coomans, D., and de Vel, O. (1994) "Comparative analysis of statistical pattern recognition methods in high dimensional settings" in Pattern Recognition, 27(8):1065-1077.

Limitations and Considerations

  • Small Sample Size: With only 178 samples, results may not generalize to larger wine classification challenges
  • Regional Specificity: All wines come from Piedmont, Italy — models may not transfer to wines from other regions
  • Historical Data: Chemical analysis methods from the 1980s may differ from modern analytical techniques
  • Class Imbalance: Class 3 has 32% fewer samples than Class 2 (48 vs. 71), which may bias some classifiers
  • Unknown Cultivars: The specific grape varieties are not disclosed, limiting domain interpretation
  • "Too Easy" for Benchmarking: Near-perfect accuracy is achievable, making it unsuitable for comparing advanced algorithms

Table Overview

wine

Contains 178 rows and 14 columns. Column types: 14 numeric.

178 rows14 columns

wine

178
rows
14
columns

Data Preview

Scroll to see more
Row 1
class1
alcohol14.23
malic_acid1.71
ash2.43
alcalinity_of_ash15.6
+9 more columns
Row 2
class1
alcohol13.2
malic_acid1.78
ash2.14
alcalinity_of_ash11.2
+9 more columns
Row 3
class1
alcohol13.16
malic_acid2.36
ash2.67
alcalinity_of_ash18.6
+9 more columns

Data Profile

178
rows
14
columns
100%
complete
121.7 KB
estimated size

Column Types

14 Numeric

High-Cardinality Columns

Columns with many unique values (suitable for identifiers or categorical features)

  • malic_acid(133 unique values)
  • flavanoids(132 unique values)
  • color_intensity(132 unique values)
  • alcohol(126 unique values)
  • od280_od315(122 unique values)
  • proline(121 unique values)
  • proanthocyanins(101 unique values)
  • total_phenols(97 unique values)

Data Dictionary

wine

ColumnTypeExampleMissing Values
classnumeric1, 10
alcoholnumeric14.23, 13.20
malic_acidnumeric1.71, 1.780
ashnumeric2.43, 2.140
alcalinity_of_ashnumeric15.6, 11.20
magnesiumnumeric127, 1000
total_phenolsnumeric2.8, 2.650
flavanoidsnumeric3.06, 2.760
nonflavanoid_phenolsnumeric0.28, 0.260
proanthocyaninsnumeric2.29, 1.280
color_intensitynumeric5.64, 4.380
huenumeric1.04, 1.050
od280_od315numeric3.92, 3.40
prolinenumeric1065, 10500
Last updated: January 2, 2026
Created: January 2, 2026