Agents for Data
Skip to main content

Iris Dataset

Fisher's Iris dataset: 150 flower samples across 3 species with sepal/petal measurements. The classic benchmark for classification algorithms and machine learning education since 1936.

machine-learningclassificationmultivariatebotanybenchmark-datasetbeginner-friendlyscikit-learnfisherlinear-discriminant-analysissupervised-learning1 table150 rows
Last updated 2 weeks agoDecember 27, 2025
Time:1936
Location:Gaspé Peninsula, Quebec, Canada
Created by Dataset Agent

Overview

The Iris Dataset is the most iconic dataset in machine learning history—often called the "Hello World" of data science. Introduced by British statistician Ronald A. Fisher in his 1936 paper "The Use of Multiple Measurements in Taxonomic Problems," this multivariate dataset has become the universal standard for teaching classification algorithms and benchmarking new methods.
The dataset contains 150 samples of iris flowers, with 50 samples from each of 3 species: Iris setosa, Iris versicolor, and Iris virginica—a perfectly balanced dataset that's rare in real-world applications.
View Source
SQL
SELECT species, COUNT(*) AS count FROM iris.csv GROUP BY species ORDER BY species
Data
SpeciesCount
Iris-setosa50
Iris-versicolor50
Iris-virginica50
3 rows

Understanding the Iris Flower

Before diving into the data, it's essential to understand what we're measuring. Many beginners encounter terms like "sepal" and "petal" without knowing what they mean botanically:
  • Sepals are the outer protective parts of the flower that enclose the bud before it blooms. In most flowers, sepals are green and leaf-like, but in iris flowers, they're colorful and often mistaken for petals. The three outer, drooping segments of an iris are actually sepals (called "falls" by gardeners).
  • Petals are the inner, typically colorful parts that attract pollinators. In iris flowers, these are the three upright segments (called "standards"). They're usually smaller than the sepals in iris species.
  • Length is measured from base to tip along the longest axis
  • Width is measured at the widest point perpendicular to the length
Unlike typical flowers where petals are larger than sepals, iris flowers have prominent sepals that are often more visually striking than the petals—making accurate measurement crucial for species identification.

Dataset Statistics by Species

The three species show dramatically different morphological characteristics, which is what makes this dataset excellent for classification tasks:
Summary Statistics: Sepal Length (cm) by Species
SpeciesMeanStd DevMinMax
Iris-setosa5.010.354.35.8
Iris-versicolor5.940.524.97.0
Iris-virginica6.590.644.97.9
3 rows
View Source
SQL
SELECT species, ROUND(AVG(sepal_length), 2) AS mean, ROUND(STDDEV (sepal_length), 2) AS std_dev, ROUND(MIN(sepal_length), 1) AS min, ROUND(MAX(sepal_length), 1) AS max FROM iris.csv GROUP BY species ORDER BY mean
Data
SpeciesMeanStd DevMinMax
Iris-setosa5.010.354.35.8
Iris-versicolor5.940.524.97.0
Iris-virginica6.590.644.97.9
3 rows
Summary Statistics: Petal Length (cm) by Species
SpeciesMeanStd DevMinMax
Iris-setosa1.460.171.01.9
Iris-versicolor4.260.473.05.1
Iris-virginica5.550.554.56.9
3 rows
View Source
SQL
SELECT species, ROUND(AVG(petal_length), 2) AS mean, ROUND(STDDEV (petal_length), 2) AS std_dev, ROUND(MIN(petal_length), 1) AS min, ROUND(MAX(petal_length), 1) AS max FROM iris.csv GROUP BY species ORDER BY mean
Data
SpeciesMeanStd DevMinMax
Iris-setosa1.460.171.01.9
Iris-versicolor4.260.473.05.1
Iris-virginica5.550.554.56.9
3 rows
The most striking difference: Iris setosa petal length averages just 1.46 cm, while Iris virginica averages 5.55 cm—nearly 4x larger. This dramatic gap makes setosa linearly separable from the other species.
View Source
SQL
SELECT species, ROUND(AVG(petal_length), 2) AS avg_petal_length FROM iris.csv GROUP BY species ORDER BY avg_petal_length
Data
SpeciesAvg Petal Length
Iris-setosa1.46
Iris-versicolor4.26
Iris-virginica5.55
3 rows

Feature Correlations and Predictive Power

Not all features are equally useful for classification. Understanding which measurements matter most helps you build better models:
Feature Correlation Matrix
Feature PairCorrelationPredictive Value
Petal Length ↔ Petal Width0.963★★★★★ Excellent
Sepal Length ↔ Petal Length0.872★★★★☆ Very Good
Sepal Length ↔ Petal Width0.818★★★★☆ Very Good
Sepal Length ↔ Sepal Width-0.109★☆☆☆☆ Poor
Sepal Width ↔ Petal Width-0.357★★☆☆☆ Weak
5 rows
View Source
SQL
SELECT ROUND(CORR(petal_length, petal_width), 3) AS petal_corr, ROUND(CORR(sepal_length, petal_length), 3) AS sl_pl_corr, ROUND(CORR(sepal_length, sepal_width), 3) AS sepal_corr FROM iris.csv
Data
Feature PairCorrelationPredictive Value
Petal Length ↔ Petal Width0.963★★★★★ Excellent
Sepal Length ↔ Petal Length0.872★★★★☆ Very Good
Sepal Length ↔ Petal Width0.818★★★★☆ Very Good
Sepal Length ↔ Sepal Width-0.109★☆☆☆☆ Poor
Sepal Width ↔ Petal Width-0.357★★☆☆☆ Weak
5 rows
Petal measurements are far more predictive than sepal measurements for species classification. The 0.963 correlation between petal length and petal width means these features carry nearly identical information—you could use just one and lose minimal accuracy.
View Source
SQL
SELECT ROUND(CORR(petal_length, petal_width), 3) AS petal_correlation FROM iris.csv
Data
Petal Correlation
0.963
1 row
Pro tip: When building models, try using only petal_length and petal_width first. You'll often achieve 95%+ accuracy with just 2 features instead of 4, making your model simpler and more interpretable.

Linear Separability: The Key Insight

One of the most important characteristics of this dataset—and why it's used to teach classification—is its partial linear separability:
  • Iris setosa is linearly separable from versicolor and virginica. A simple rule like "petal_length < 2.5 cm" correctly identifies 100% of setosa samples.
  • Versicolor and virginica overlap in feature space. No straight line can perfectly separate them, requiring more sophisticated algorithms or accepting some misclassification.
  • This makes the dataset ideal for demonstrating why simple linear classifiers work sometimes but not always.
Using petal_length < 2.5 cm as a threshold, all 50 setosa samples are correctly isolated with zero false positives from the other 100 samples.
View Source
SQL
SELECT species, COUNT(*) AS count FROM iris.csv WHERE petal_length < 2.5 GROUP BY species
Data
SpeciesCount
Iris-setosa50
1 row

Historical Context

The Iris dataset was introduced by Ronald A. Fisher in his 1936 paper "The Use of Multiple Measurements in Taxonomic Problems" published in the Annals of Eugenics. Fisher used this data to demonstrate linear discriminant analysis (LDA), a method he developed for classifying observations into predefined categories.
The actual measurements were collected by Edgar Anderson, an American botanist who gathered the data from iris flowers on the Gaspé Peninsula in Quebec, Canada. Anderson's meticulous measurements of 50 specimens from each of the three species provided Fisher with a perfectly balanced dataset for his statistical analysis. The collaboration between Anderson's fieldwork and Fisher's statistical methods exemplifies early interdisciplinary data science.

Known Data Quality Issues

The UCI Machine Learning Repository version contains two documented transcription errors that have propagated through many copies of this dataset.
The known errors are in samples 35 and 38 (0-indexed), both Iris setosa:
For most educational purposes, these minor errors don't significantly impact results. However, if you're publishing research or need exact reproducibility, consider using the corrected version from Fisher's original paper or noting which version you used.

Why This Dataset is Perfect for Beginners

The Iris dataset has specific properties that make it ideal for learning machine learning fundamentals:

Limitations for Real-World ML

While excellent for learning, the Iris dataset has significant limitations that don't reflect real-world machine learning challenges:
  • Too small: 150 samples is trivial—modern datasets have millions of records
  • Too clean: No missing values, no outliers, no noise—unrealistic for production data
  • Too balanced: Real classification problems often have 99:1 or worse class imbalance
  • Too easy: Most algorithms achieve 95%+ accuracy, making it hard to compare methods
  • Too simple for deep learning: Neural networks need thousands of samples to show their advantages over simpler methods
If your model achieves less than 90% accuracy on Iris, there's likely a bug in your code. If it achieves 100%, you may be overfitting or evaluating on training data. Target 95-98% accuracy as a sanity check.

Sample Data Preview

First 5 Records from Each Species
#Sepal LengthSepal WidthPetal LengthPetal WidthSpecies
15.13.51.40.2Iris-setosa
24.931.40.2Iris-setosa
373.24.71.4Iris-versicolor
46.43.24.51.5Iris-versicolor
56.33.362.5Iris-virginica
65.82.75.11.9Iris-virginica
6 rows
View Source
SQL
( SELECT * FROM iris.csv WHERE species = 'Iris-setosa' LIMIT 2 ) UNION ALL ( SELECT * FROM iris.csv WHERE species = 'Iris-versicolor' LIMIT 2 ) UNION ALL ( SELECT * FROM iris.csv WHERE species = 'Iris-virginica' LIMIT 2 )
Data
Sepal LengthSepal WidthPetal LengthPetal WidthSpecies
5.13.51.40.2Iris-setosa
4.931.40.2Iris-setosa
73.24.71.4Iris-versicolor
6.43.24.51.5Iris-versicolor
6.33.362.5Iris-virginica
5.82.75.11.9Iris-virginica
6 rows

Expected Model Performance

Use these benchmarks to validate your implementations:

Table Overview

iris

Contains 150 rows and 5 columns. Column types: 4 numeric, 1 text.

150 rows5 columns

iris

150
rows
5
columns

Data Preview

Scroll to see more
Row 1
sepal_length5.1
sepal_width3.5
petal_length1.4
petal_width0.2
speciesIris-setosa
Row 2
sepal_length4.9
sepal_width3
petal_length1.4
petal_width0.2
speciesIris-setosa
Row 3
sepal_length4.7
sepal_width3.2
petal_length1.3
petal_width0.2
speciesIris-setosa

Data Profile

150
rows
5
columns
100%
complete
36.6 KB
estimated size

Column Types

4 Numeric1 Text

Data Dictionary

iris

ColumnTypeExampleMissing Values
sepal_lengthnumeric5.1, 4.90
sepal_widthnumeric3.5, 30
petal_lengthnumeric1.4, 1.40
petal_widthnumeric0.2, 0.20
speciesstring"Iris-setosa", "Iris-setosa"0
Last updated: December 27, 2025
Created: December 26, 2025