Last updated 1 weeks ago•January 2, 2026
Time:1980s
Location:Piedmont, Italy
Created by Dataset Agent
Overview
The Wine Dataset is one of the most iconic benchmark datasets in machine learning, originating from chemical analysis of wines grown in the Piedmont region of Italy. The wines derive from three different cultivars (grape varieties), and the classification task involves determining which cultivar a wine belongs to based solely on its chemical properties. Created by M. Forina and colleagues at the Institute of Pharmaceutical and Food Analysis and Technologies in Genoa, Italy, this dataset was donated to the UCI Machine Learning Repository in 1991 by Stefan Aeberhard.
The dataset contains 178 wine samples with complete chemical analysis profiles across 13 continuous attributes and 3 cultivar classes.
<details>
<summary>View SQL Query</summary>
``
sql
SELECT COUNT(*) as total_samples, COUNT(DISTINCT class) as num_classes FROM wine.csv
``
</details>View Source
Zero Missing Values: Unlike many real-world datasets, the Wine Dataset is complete with no missing values across all 178 samples and 14 columns. This makes it ideal for learning ML fundamentals without requiring imputation techniques.
The 13 Chemical Attributes Explained
Understanding what each chemical attribute measures is crucial for interpreting model results and feature importance. These attributes capture the essential chemical fingerprint that distinguishes wines from different cultivars.
Class Distribution Analysis
The three wine cultivars show moderate class imbalance, which is important to consider when training classifiers and evaluating model performance.
View Source
Class 2 dominates with 71 samples (39.9%), while Class 3 has the fewest with 48 samples (27.0%). This ~1.5:1 imbalance ratio is mild but should be considered when using accuracy as an evaluation metric.
<details>
<summary>View SQL Query</summary>
``
sql
SELECT class, COUNT(*) as count, ROUND(COUNT(*) * 100.0 / 178, 1) as percentage FROM wine.csv GROUP BY class ORDER BY class
``
</details>View Source
Feature Statistics Summary
The 13 features span vastly different scales, making feature scaling (standardization or normalization) essential for distance-based algorithms like k-NN and SVM.
Descriptive Statistics for All Features
| Feature | Min | Max | Mean | Std Dev | Scale Factor |
|---|---|---|---|---|---|
| Alcohol | 11.03 | 14.83 | 13.00 | 0.81 | 1x |
| Malic Acid | 0.74 | 5.80 | 2.34 | 1.12 | 1x |
| Ash | 1.36 | 3.23 | 2.37 | 0.27 | 1x |
| Alcalinity of Ash | 10.60 | 30.00 | 19.49 | 3.34 | 10x |
| Magnesium | 70 | 162 | 99.74 | 14.28 | 100x |
| Total Phenols | 0.98 | 3.88 | 2.29 | 0.63 | 1x |
| Flavanoids | 0.34 | 5.08 | 2.03 | 1.00 | 1x |
| Nonflavanoid Phenols | 0.13 | 0.66 | 0.36 | 0.12 | 0.1x |
| Proanthocyanins | 0.41 | 3.58 | 1.59 | 0.57 | 1x |
| Color Intensity | 1.28 | 13.00 | 5.06 | 2.32 | 10x |
| Hue | 0.48 | 1.71 | 0.96 | 0.23 | 1x |
| OD280/OD315 | 1.27 | 4.00 | 2.61 | 0.71 | 1x |
| Proline | 278 | 1680 | 746.89 | 314.91 | 1000x |
| 13 rows | |||||
View Source
Proline shows the largest variation with values ranging from 278 to 1,680 mg/L (6x range), while Ash is the most consistent feature with only a 1.87 range (1.36 to 3.23).
<details>
<summary>View SQL Query</summary>
``
sql
SELECT
ROUND(MIN(proline), 0) as proline_min,
ROUND(MAX(proline), 0) as proline_max,
ROUND(MIN(ash), 2) as ash_min,
ROUND(MAX(ash), 2) as ash_max
FROM wine.csv
``
</details>View Source
Distinguishing Chemical Signatures by Cultivar
Each cultivar exhibits distinct chemical profiles that enable high classification accuracy. These differences reflect unique terroir characteristics and grape variety genetics.
Average Feature Values by Cultivar Class
| Feature | Class 1 | Class 2 | Class 3 | Best Discriminator? |
|---|---|---|---|---|
| Alcohol (%) | 13.74 | 12.28 | 13.15 | ✓ Class 2 lowest |
| Flavanoids | 2.98 | 2.08 | 0.78 | ✓✓ Strong separator |
| Color Intensity | 5.53 | 3.09 | 7.40 | ✓✓ Strong separator |
| Proline (mg/L) | 1116 | 520 | 630 | ✓✓ Class 1 highest |
| Hue | 1.06 | 1.06 | 0.68 | ✓ Class 3 lowest |
| OD280/OD315 | 3.16 | 2.78 | 1.68 | ✓ Class 3 lowest |
| 6 rows | ||||
View Source
Flavanoids is the strongest discriminating feature: Class 1 averages 2.98, Class 2 averages 2.08, and Class 3 averages only 0.78 — nearly a 4x difference between Class 1 and Class 3.
<details>
<summary>View SQL Query</summary>
``
sql
SELECT class, ROUND(AVG(flavanoids), 2) as avg_flavanoids FROM wine.csv GROUP BY class ORDER BY class
``
</details>View Source
View Source
Feature Correlations and Multicollinearity
Several features are highly correlated, which affects feature selection strategies and can cause issues with certain algorithms. Understanding these relationships guides effective dimensionality reduction.
Flavanoids and Total Phenols show the strongest positive correlation at r = 0.865, indicating these phenolic compounds are chemically related. This high correlation suggests one could be dropped without significant information loss.
<details>
<summary>View SQL Query</summary>
``
sql
SELECT ROUND(CORR(flavanoids, total_phenols), 3) as correlation FROM wine.csv
``
</details>View Source
Notable Feature Correlations
| Feature Pair | Correlation | Interpretation |
|---|---|---|
| Flavanoids ↔ Total Phenols | +0.865 | Strong positive; redundant information |
| Flavanoids ↔ OD280/OD315 | +0.787 | Both relate to phenolic content |
| Alcohol ↔ Proline | +0.644 | Moderate positive; ripeness indicators |
| Hue ↔ Malic Acid | +0.561 | Color relates to acid profile |
| Color Intensity ↔ Flavanoids | -0.172 | Weak negative; interesting inverse |
| 5 rows | ||
View Source
Sample Data Preview
Here are representative samples from each cultivar class showing actual data values. Note how the chemical profiles differ systematically between classes.
Sample Wine Records (Selected Features)
| Class | Alcohol | Malic Acid | Flavanoids | Color Intensity | Proline |
|---|---|---|---|---|---|
| 1 | 14.23 | 1.71 | 3.06 | 5.64 | 1,065 |
| 1 | 13.2 | 1.78 | 2.76 | 4.38 | 1,050 |
| 2 | 12.37 | 0.94 | 2.76 | 3 | 520 |
| 2 | 12.33 | 1.1 | 2.43 | 2.2 | 680 |
| 3 | 12.86 | 1.35 | 0.64 | 7.65 | 720 |
| 3 | 13.11 | 1.01 | 0.75 | 10.8 | 630 |
| 6 rows | |||||
View Source
Common Confusion: Wine vs. Wine Quality Dataset
Don't confuse these two popular datasets! The UCI Wine Dataset (this one) and the Wine Quality Dataset are frequently mixed up but serve different purposes.
Expected Classification Performance
Due to well-separated class boundaries, this dataset achieves high accuracy with most classifiers. Here are typical benchmark results:
The high achievable accuracy (95-100%) makes this dataset excellent for validating implementations and learning ML workflows, but less suitable for benchmarking state-of-the-art algorithms where performance differences would be negligible.
Historical Context and Provenance
The Wine Dataset originates from research conducted in the 1980s at the Institute of Pharmaceutical and Food Analysis and Technologies in Genoa, Italy. The wines analyzed came from the Piedmont region, one of Italy's most prestigious wine-producing areas, known for varieties like Barolo, Barbaresco, and Barbera. While the specific cultivar names are not disclosed in the dataset, they represent three distinct grape varieties grown in the same geographical region.
Stefan Aeberhard donated the dataset to the UCI Machine Learning Repository in 1991, where it has since been cited in thousands of academic papers. The original research was published in Aeberhard, S., Coomans, D., and de Vel, O. (1994) "Comparative analysis of statistical pattern recognition methods in high dimensional settings" in Pattern Recognition, 27(8):1065-1077.
Limitations and Considerations
- Small Sample Size: With only 178 samples, results may not generalize to larger wine classification challenges
- Regional Specificity: All wines come from Piedmont, Italy — models may not transfer to wines from other regions
- Historical Data: Chemical analysis methods from the 1980s may differ from modern analytical techniques
- Class Imbalance: Class 3 has 32% fewer samples than Class 2 (48 vs. 71), which may bias some classifiers
- Unknown Cultivars: The specific grape varieties are not disclosed, limiting domain interpretation
- "Too Easy" for Benchmarking: Near-perfect accuracy is achievable, making it unsuitable for comparing advanced algorithms
Table Overview
wine
Data Preview
Scroll to see more| class | alcohol | malic_acid | ash | alcalinity_of_ash | magnesium | total_phenols | flavanoids | nonflavanoid_phenols | proanthocyanins | color_intensity | hue | od280_od315 | proline |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 14.23 | 1.71 | 2.43 | 15.6 | 127 | 2.8 | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1,065 |
| 1 | 13.2 | 1.78 | 2.14 | 11.2 | 100 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.4 | 1,050 |
| 1 | 13.16 | 2.36 | 2.67 | 18.6 | 101 | 2.8 | 3.24 | 0.3 | 2.81 | 5.68 | 1.03 | 3.17 | 1,185 |
| 1 | 14.37 | 1.95 | 2.5 | 16.8 | 113 | 3.85 | 3.49 | 0.24 | 2.18 | 7.8 | 0.86 | 3.45 | 1,480 |
| 1 | 13.24 | 2.59 | 2.87 | 21 | 118 | 2.8 | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735 |
Row 1
class1
alcohol14.23
malic_acid1.71
ash2.43
alcalinity_of_ash15.6
+9 more columns
Row 2
class1
alcohol13.2
malic_acid1.78
ash2.14
alcalinity_of_ash11.2
+9 more columns
Row 3
class1
alcohol13.16
malic_acid2.36
ash2.67
alcalinity_of_ash18.6
+9 more columns
Showing 5 of 178 rows
Data Profile
178
rows
14
columns
100%
complete
121.7 KB
estimated size
Column Types
14 Numeric
High-Cardinality Columns
Columns with many unique values (suitable for identifiers or categorical features)
- malic_acid(133 unique values)
- flavanoids(132 unique values)
- color_intensity(132 unique values)
- alcohol(126 unique values)
- od280_od315(122 unique values)
- proline(121 unique values)
- proanthocyanins(101 unique values)
- total_phenols(97 unique values)
Data Dictionary
wine
| Column | Type | Example | Missing Values |
|---|---|---|---|
class | numeric | 1, 1 | 0 |
alcohol | numeric | 14.23, 13.2 | 0 |
malic_acid | numeric | 1.71, 1.78 | 0 |
ash | numeric | 2.43, 2.14 | 0 |
alcalinity_of_ash | numeric | 15.6, 11.2 | 0 |
magnesium | numeric | 127, 100 | 0 |
total_phenols | numeric | 2.8, 2.65 | 0 |
flavanoids | numeric | 3.06, 2.76 | 0 |
nonflavanoid_phenols | numeric | 0.28, 0.26 | 0 |
proanthocyanins | numeric | 2.29, 1.28 | 0 |
color_intensity | numeric | 5.64, 4.38 | 0 |
hue | numeric | 1.04, 1.05 | 0 |
od280_od315 | numeric | 3.92, 3.4 | 0 |
proline | numeric | 1065, 1050 | 0 |