Last updated 2 months ago•December 27, 2025
Time:April 1912
Location:North Atlantic Ocean (Southampton to New York route)
Created by Dataset Agent
Overview
The Titanic dataset is the most widely-used introductory dataset for binary classification in machine learning. It contains passenger information from the RMS Titanic, which sank on April 15, 1912, after colliding with an iceberg during its maiden voyage from Southampton to New York City. Of the estimated 2,224 passengers and crew aboard, approximately 1,502 perished—making it one of the deadliest peacetime maritime disasters in history.
This dataset answers the question: What sorts of people were more likely to survive? The data reveals that survival was not random—factors like gender, passenger class, and age significantly influenced who lived and who died.
The dataset contains 887 passenger records with an overall survival rate of 38.56% (342 survivors out of 887 passengers).
View Source
Dataset at a Glance
Passenger Class Distribution
The Pclass column represents ticket class, which served as a proxy for socioeconomic status in 1912:
Passenger Class Breakdown
| Pclass Value | Class Name | Count | Percentage | Description |
|---|---|---|---|---|
| 1 | First Class | 216 | 24.4% | Upper class, luxury cabins on upper decks |
| 2 | Second Class | 184 | 20.7% | Middle class, comfortable accommodations |
| 3 | Third Class | 487 | 54.9% | Lower class, basic quarters on lower decks |
| 3 rows | ||||
View Source
Note: The dataset shows significant class imbalance—over half the passengers (54.9%) traveled in third class. This reflects the ship's design to maximize steerage passenger capacity for immigrant transport to America.
Key Survival Patterns
Gender: The Strongest Predictor
74.2% of women survived compared to only 18.9% of men—a 55 percentage point difference that reflects the "women and children first" evacuation protocol.
View Source
View Source
Class: Wealth Determined Access
First-class passengers had a 62.96% survival rate versus only 24.44% for third-class—a gap of nearly 40 percentage points revealing how cabin location affected lifeboat access.
View Source
View Source
The Compounding Effect: Gender × Class
First-class women had an extraordinary 96.81% survival rate, while third-class men survived at only 13.7%—a 7:1 ratio demonstrating how gender and class compounded to determine fate.
View Source
View Source
Age: Children Had Better Odds
Passengers ranged from 0.42 to 80 years old with a mean age of 29.5 years. Children under 18 had a 50% survival rate—the highest among all age groups.
View Source
View Source
Fare: Economic Status Mattered
Ticket fares ranged from £0 to £512.33 (average £32.31). Passengers paying premium fares (£100+) had a 73.58% survival rate versus just 6.67% for those with free tickets.
View Source
View Source
Data Quality Notes
Understanding data quality is essential for accurate analysis. This dataset has varying completeness across columns:
Age Missing Values: Approximately 20% of Age values are missing. Common strategies include: (1) median imputation by passenger class, (2) median imputation by title extracted from Name, or (3) creating an 'Age_Missing' binary feature to capture the missingness pattern itself.
Feature Engineering Opportunities
The raw features can be transformed to improve predictive models. Here are proven feature engineering techniques used by top Kaggle competitors:
Title Extraction Tip: The Name column contains honorific titles (Mr., Mrs., Miss., Master., Dr., Rev., etc.) that strongly correlate with survival. Extract using regex:
df['Title'] = df['Name'].str.extract(' ([A-Za-z]+)\.') — "Master" indicates male children with 57.5% survival rate.Benchmark Model Performance
When building survival prediction models, here are typical accuracy benchmarks to compare against:
Historical Context
The RMS Titanic was the largest ship afloat at the time of its maiden voyage in April 1912. Operated by the White Star Line, it was marketed as "practically unsinkable" due to its advanced safety features including 16 watertight compartments. The ship struck an iceberg at 11:40 PM on April 14, 1912, and sank in under three hours.
The disaster exposed critical failures: the ship carried only 20 lifeboats (capacity for 1,178 people) despite holding 2,224 passengers and crew. Third-class passengers faced locked gates and confusing routes to the boat deck. The tragedy led to the International Convention for the Safety of Life at Sea (SOLAS), still the primary maritime safety treaty today.
Sample Records
Example Passenger Records
| Survived | Pclass | Name | Sex | Age | Sib Sp | Parch | Fare |
|---|---|---|---|---|---|---|---|
| 0 | 3 | Mr. Owen Harris Braund | male | 22 | 1 | 0 | £7.25 |
| 1 | 1 | Mrs. John Bradley Cumings | female | 38 | 1 | 0 | £71.28 |
| 1 | 3 | Miss. Laina Heikkinen | female | 26 | 0 | 0 | £7.93 |
| 1 | 1 | Mrs. Jacques Heath Futrelle | female | 35 | 1 | 0 | £53.10 |
| 0 | 3 | Mr. William Henry Allen | male | 35 | 0 | 0 | £8.05 |
| 5 rows | |||||||
View Source
Important Considerations
Dataset Scope: This dataset contains 887 of the ~1,309 passengers aboard (excluding crew). Records with incomplete information were excluded from this version. The 38.56% survival rate here is slightly higher than the historical 32% overall survival rate due to this sampling.
Column Encoding: The 'Survived' column uses binary encoding (1 = survived, 0 = did not survive). The 'Sex' column contains string values ('male', 'female') that must be encoded for most ML algorithms. SibSp counts siblings AND spouses; Parch counts parents AND children.
Dataset Variants Comparison
Multiple versions of the Titanic dataset exist across different platforms. This table helps identify which version you're working with:
Table Overview
titanic
Data Preview
Scroll to see more| Survived | Pclass | Name | Sex | Age | Siblings/Spouses Aboard | Parents/Children Aboard | Fare |
|---|---|---|---|---|---|---|---|
| 0 | 3 | Mr. Owen Harris Braund | male | 22 | 1 | 0 | 7.25 |
| 1 | 1 | Mrs. John Bradley (Florence Briggs Th... | female | 38 | 1 | 0 | 71.28 |
| 1 | 3 | Miss. Laina Heikkinen | female | 26 | 0 | 0 | 7.93 |
| 1 | 1 | Mrs. Jacques Heath (Lily May Peel) Fu... | female | 35 | 1 | 0 | 53.1 |
| 0 | 3 | Mr. William Henry Allen | male | 35 | 0 | 0 | 8.05 |
Row 1
Survived0
Pclass3
NameMr. Owen Harris Braund
Sexmale
Age22
+3 more columns
Row 2
Survived1
Pclass1
NameMrs. John Bradley (Florence...
Sexfemale
Age38
+3 more columns
Row 3
Survived1
Pclass3
NameMiss. Laina Heikkinen
Sexfemale
Age26
+3 more columns
Showing 5 of 887 rows
Data Profile
887
rows
8
columns
100%
complete
346.5 KB
estimated size
Column Types
6 Numeric2 Text
High-Cardinality Columns
Columns with many unique values (suitable for identifiers or categorical features)
- Name(887 unique values)
Data Dictionary
titanic
| Column | Type | Example | Missing Values |
|---|---|---|---|
Survived | numeric | 0, 1 | 0 |
Pclass | numeric | 3, 1 | 0 |
Name | string | "Mr. Owen Harris Brau...", "Mrs. John Bradley (F..." | 0 |
Sex | string | "male", "female" | 0 |
Age | numeric | 22, 38 | 0 |
Siblings/Spouses Aboard | numeric | 1, 1 | 0 |
Parents/Children Aboard | numeric | 0, 0 | 0 |
Fare | numeric | 7.25, 71.2833 | 0 |