Last updated 1 weeks ago•January 2, 2026
Time:Cross-sectional snapshot
Location:India
Created by Dataset Agent
Overview
The House Price Dataset is a curated collection of 545 residential property listings from the Indian real estate market, designed specifically for machine learning practitioners and data science students. Unlike many housing datasets that require extensive cleaning, this dataset arrives analysis-ready with zero missing values across all features—making it an ideal starting point for regression modeling and price prediction projects.
This dataset contains 545 property listings with 13 distinct features and 0% missing values across all columns.
View Source
Each record represents a unique residential property with comprehensive information spanning physical characteristics (area in square feet, bedrooms, bathrooms, stories), location attributes (main road access, preferred area designation), and amenities (air conditioning, basement, hot water heating, guest room, parking spaces, and furnishing status).
How This Dataset Compares
When selecting a housing dataset for machine learning projects, practitioners often compare multiple options. Here's how this dataset positions against other popular choices:
This dataset's strength lies in its clean structure and balanced feature mix—13 features provide enough complexity for meaningful analysis without overwhelming beginners, while the zero missing values eliminate preprocessing hurdles.
Key Statistics and Market Insights
Property prices range from ₹1,750,000 to ₹13,300,000 with a median of ₹4,340,000 and mean of ₹4.77 million—a 7.6x spread indicating significant market segmentation.
View Source
Property areas span 1,650 to 16,200 square feet with an average of 5,151 sq ft—nearly 10x variation suitable for exploring non-linear price relationships.
View Source
Price Drivers and Feature Analysis
View Source
3-bedroom homes dominate with 300 listings (55% of dataset), followed by 2-bedroom (136, 25%) and 4-bedroom (95, 17%) properties.
View Source
Amenity Premium Analysis
Understanding which features command price premiums is essential for feature importance analysis and real-world valuation models:
View Source
Key Finding: Air conditioning delivers the highest price premium at 43% (₹6.01M vs ₹4.19M). Main road access shows the strongest absolute impact at 47% premium—critical insights for feature engineering in predictive models.
Variable Correlations
The area-price correlation is 0.54, indicating a moderate positive relationship—larger homes command higher prices, but other factors significantly influence valuation.
View Source
This moderate correlation (rather than strong) makes the dataset valuable for demonstrating multicollinearity concepts and the importance of categorical features in regression models. The boolean amenity features often explain variance that area alone cannot capture.
Property Distribution Patterns
View Source
Furnished homes average ₹5.5 million versus ₹4.0 million for unfurnished—a 38% premium that demonstrates clear categorical feature impact on price.
View Source
View Source
Four-story properties command ₹7.2 million average—72% higher than single-story homes at ₹4.17 million. This non-linear relationship with stories makes the dataset excellent for polynomial feature exploration.
View Source
Sample Records
Top 5 Most Expensive Properties
| # | Price (₹) | Area (Sq Ft) | Bedrooms | Bathrooms | Stories | AC | Furnishing |
|---|---|---|---|---|---|---|---|
| 1 | 13,300,000 | 7,420 | 4 | 2 | 3 | Yes | Furnished |
| 2 | 12,250,000 | 8,960 | 4 | 4 | 4 | Yes | Furnished |
| 3 | 12,250,000 | 9,960 | 3 | 2 | 2 | No | Semi-furnished |
| 4 | 12,215,000 | 7,500 | 4 | 2 | 2 | Yes | Furnished |
| 5 | 11,410,000 | 7,420 | 4 | 1 | 2 | Yes | Furnished |
| 5 rows | |||||||
View Source
Note that the highest-priced property (₹13.3M) is not the largest by area—it's the 7,420 sq ft furnished home with AC. This illustrates why multi-feature models outperform single-variable approaches.
Data Quality and Limitations
This dataset represents a cross-sectional snapshot of the Indian real estate market. Prices reflect historical data and should be used for educational and analytical purposes only—not for actual investment decisions.
When using this dataset, consider these characteristics:
- Zero Missing Values: All 545 records are complete across all 13 columns—no imputation required
- Sample Size: 545 records support traditional ML algorithms well; may be insufficient for deep learning without augmentation
- Geographic Scope: Properties appear concentrated in Indian metros; findings may not generalize to other markets
- No Temporal Data: Timestamps are absent, preventing time-series analysis or trend modeling
- Feature Gaps: Common variables like property age, lot size, neighborhood details, and school district ratings are not included
Recommended Model Approaches
Based on the dataset's structure and feature mix, these algorithms typically perform well:
Feature engineering opportunities include: creating price-per-square-foot ratios, binning continuous variables like area into categories, generating interaction terms between amenities, and one-hot encoding the furnishing status variable.
Table Overview
house_price
Data Preview
Scroll to see more| price | area | bedrooms | bathrooms | stories | mainroad | guestroom | basement | hotwaterheating | airconditioning | parking | prefarea | furnishingstatus |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13,300,000 | 7,420 | 4 | 2 | 3 | true | false | false | false | true | 2 | true | furnished |
| 12,250,000 | 8,960 | 4 | 4 | 4 | true | false | false | false | true | 3 | false | furnished |
| 12,250,000 | 9,960 | 3 | 2 | 2 | true | false | true | false | false | 2 | true | semi-furnished |
| 12,215,000 | 7,500 | 4 | 2 | 2 | true | false | true | false | true | 3 | true | furnished |
| 11,410,000 | 7,420 | 4 | 1 | 2 | true | true | true | false | true | 2 | false | furnished |
Row 1
price13,300,000
area7,420
bedrooms4
bathrooms2
stories3
+8 more columns
Row 2
price12,250,000
area8,960
bedrooms4
bathrooms4
stories4
+8 more columns
Row 3
price12,250,000
area9,960
bedrooms3
bathrooms2
stories2
+8 more columns
Showing 5 of 545 rows
Data Profile
545
rows
13
columns
100%
complete
345.9 KB
estimated size
Column Types
6 Numeric1 Text6 Boolean
High-Cardinality Columns
Columns with many unique values (suitable for identifiers or categorical features)
- area(284 unique values)
Data Dictionary
house_price
| Column | Type | Example | Missing Values |
|---|---|---|---|
price | numeric | 13300000, 12250000 | 0 |
area | numeric | 7420, 8960 | 0 |
bedrooms | numeric | 4, 4 | 0 |
bathrooms | numeric | 2, 4 | 0 |
stories | numeric | 3, 4 | 0 |
mainroad | boolean | true, true | 0 |
guestroom | boolean | false, false | 0 |
basement | boolean | false, false | 0 |
hotwaterheating | boolean | false, false | 0 |
airconditioning | boolean | true, true | 0 |
parking | numeric | 2, 3 | 0 |
prefarea | boolean | true, false | 0 |
furnishingstatus | string | "furnished", "furnished" | 0 |