Last updated 3 weeks ago•January 2, 2026
Time:January 2006
Location:United States (Domestic Flights)
Created by Dataset Agent
Overview
This dataset contains exactly 1,000,000 U.S. domestic flight records from January 2006, sourced from the Bureau of Transportation Statistics. Each row represents a single flight operation with seven key metrics: flight date, departure delay, arrival delay, air time, distance, departure time, and arrival time.
With zero null values across all columns, this dataset is immediately ready for analysis—no cleaning or preprocessing required. The strong correlation (0.91) between departure and arrival delays makes it particularly valuable for regression modeling and delay prediction applications.
Data Preview: First 5 Flight Records
| FL DATE | DEP DELAY | ARR DELAY | AIR TIME | DISTANCE | DEP TIME | ARR TIME |
|---|---|---|---|---|---|---|
| 2006-01-01 | 5 | 19 | 350 | 2475 | 9.08 | 12.48 |
| 2006-01-02 | 167 | 216 | 343 | 2475 | 11.78 | 15.77 |
| 2006-01-03 | -7 | -2 | 344 | 2475 | 8.88 | 12.13 |
| 2006-01-04 | -5 | -13 | 331 | 2475 | 8.92 | 11.95 |
| 2006-01-05 | -3 | -17 | 321 | 2475 | 8.95 | 11.88 |
| 5 rows | ||||||
View Source
Dataset Schema
Understanding decimal time: DEP_TIME and ARR_TIME use decimal hours. To convert 9.08 to standard time: 9 hours + (0.08 × 60) = 9:05 AM. Multiply the decimal portion by 60 to get minutes.
Data Coverage
- Time Period: January 1-31, 2006 (31 days of flight operations)
- Geographic Scope: U.S. domestic flights only
- Record Count: Exactly 1,000,000 flight records
- Data Completeness: 100% complete—zero null values in any column
- Source: Bureau of Transportation Statistics (BTS)
Key Statistics and Insights
The average departure delay is 8.65 minutes, while the average arrival delay is 6.40 minutes—indicating pilots typically recover about 2 minutes during flight.
View Source
Flight distances span from 30 miles (regional hops) to 4,962 miles (transcontinental), with an average of 741 miles per flight.
View Source
The correlation between departure and arrival delays is 0.91—extremely strong, making this dataset ideal for regression-based delay prediction models.
View Source
On-Time Performance Analysis
View Source
Key findings from the delay distribution:
- 63.5% of flights arrive on time or early (635,308 flights)
- 18.9% experience minor delays of 1-15 minutes
- 4.9% face severe delays exceeding 60 minutes
- The median departure delay is 0 minutes—half of all flights depart on schedule or early
Delay Patterns by Time of Day
View Source
Travel tip: Morning flights (6 AM - 12 PM) average just 2.61 minutes delay—nearly 7x better than evening flights at 18.09 minutes. Book early for the best on-time performance.
Flight Distance Distribution
View Source
Medium-haul flights (600-1,200 miles) dominate with 325,911 flights (32.6%), followed by short-haul routes. Ultra long-haul flights over 2,500 miles represent just 1.3% of operations—these transcontinental routes average nearly 6 hours of air time.
Working with 1 Million Rows
Excel limitation: This dataset exceeds Excel's 1,048,576 row limit. Use Python (pandas), R, DuckDB, or database tools for analysis. The CSV file is approximately 45MB.
Recommended approaches for working with this dataset:
- Python pandas: Loads in ~2 seconds, uses ~150MB RAM
- DuckDB: Query directly from CSV without loading into memory
- Polars: Faster alternative to pandas for large datasets
- R data.table: Memory-efficient for million-row datasets
- SQL databases: Import into PostgreSQL, SQLite, or MySQL for complex queries
Sample Analysis Questions
This dataset supports a wide range of analytical explorations:
- What is the probability of a flight being delayed more than 30 minutes given it departs after 6 PM?
- How does flight distance correlate with delay recovery (difference between departure and arrival delay)?
- Which day of the week has the highest average delays?
- Can we predict arrival delay within ±15 minutes using only departure delay and air time?
- What percentage of severely delayed departures (60+ min) still arrive within 30 minutes of schedule?
Limitations and Considerations
Historical data: This dataset is from January 2006. Aviation patterns, airline operations, and air traffic management have evolved significantly. Use caution when applying insights to current scenarios.
- No carrier information: Airline codes are not included—cannot compare performance by carrier
- No route details: Origin/destination airports are absent—prevents route-level analysis
- No cancellation data: Only completed flights; cancelled flights are not represented
- Extreme outliers: Some delays reach -1,197 minutes (likely data errors); consider filtering outliers beyond ±500 minutes
- Single month: January-only data may not capture seasonal patterns (summer thunderstorms, holiday travel)
Data Quality Summary
The median departure delay is 0 minutes, with 75% of flights departing within 8 minutes of scheduled time—demonstrating generally reliable airline operations.
View Source
Table Overview
flights_1m
Data Preview
Scroll to see more| FL_DATE | DEP_DELAY | ARR_DELAY | AIR_TIME | DISTANCE | DEP_TIME | ARR_TIME |
|---|---|---|---|---|---|---|
| Fri Feb 03 2006 13:00:00 GMT+1300 (Ne... | 26 | 22 | 55 | 337 | 21.1 | 22.33 |
| Fri Feb 03 2006 13:00:00 GMT+1300 (Ne... | -5 | 2 | 122 | 762 | 7.92 | 10.33 |
| Fri Feb 03 2006 13:00:00 GMT+1300 (Ne... | -1 | -20 | 120 | 1,009 | 11.82 | 15.07 |
| Fri Feb 03 2006 13:00:00 GMT+1300 (Ne... | -3 | -6 | 154 | 1,009 | 15.95 | 17.73 |
| Fri Feb 03 2006 13:00:00 GMT+1300 (Ne... | 9 | 30 | 109 | 550 | 19.07 | 21.18 |
Row 1
FL_DATEFri Feb 03 2006 13:00:00 GM...
DEP_DELAY26
ARR_DELAY22
AIR_TIME55
DISTANCE337
+2 more columns
Row 2
FL_DATEFri Feb 03 2006 13:00:00 GM...
DEP_DELAY-5
ARR_DELAY2
AIR_TIME122
DISTANCE762
+2 more columns
Row 3
FL_DATEFri Feb 03 2006 13:00:00 GM...
DEP_DELAY-1
ARR_DELAY-20
AIR_TIME120
DISTANCE1,009
+2 more columns
Showing 5 of 1,000,000 rows
Data Profile
1,000,000
rows
7
columns
100%
complete
333.8 MB
estimated size
Column Types
6 Numeric1 Text
Data Dictionary
flights_1m
| Column | Type | Example | Missing Values |
|---|---|---|---|
FL_DATE | string | "Fri Feb 03 2006 13:0...", "Fri Feb 03 2006 13:0..." | 0 |
DEP_DELAY | numeric | 26, -5 | 0 |
ARR_DELAY | numeric | 22, 2 | 0 |
AIR_TIME | numeric | 55, 122 | 0 |
DISTANCE | numeric | 337, 762 | 0 |
DEP_TIME | numeric | 21.100000381469727, 7.916666507720947 | 0 |
ARR_TIME | numeric | 22.33333396911621, 10.333333015441895 | 0 |