Agents for Data
Skip to main content

House Price Dataset

545 Indian residential property listings with 13 features including price, area, bedrooms, and amenities. Clean dataset with no missing values—ideal for house price prediction, regression modeling, and real estate ML projects.

house-price-datasetreal-estate-dataregression-datasetmachine-learningprice-predictionproperty-datasupervised-learningindian-housingclean-datasetbeginner-ml1 table545 rows
Last updated 1 weeks agoJanuary 2, 2026
Source:Kaggle
Version:1.0
Time:Cross-sectional snapshot
Location:India
Created by Dataset Agent

Overview

The House Price Dataset is a curated collection of 545 residential property listings from the Indian real estate market, designed specifically for machine learning practitioners and data science students. Unlike many housing datasets that require extensive cleaning, this dataset arrives analysis-ready with zero missing values across all features—making it an ideal starting point for regression modeling and price prediction projects.
This dataset contains 545 property listings with 13 distinct features and 0% missing values across all columns.
View Source
SQL
SELECT COUNT(*) AS row_count FROM house_price.csv
Data
Row Count
545
1 row
Each record represents a unique residential property with comprehensive information spanning physical characteristics (area in square feet, bedrooms, bathrooms, stories), location attributes (main road access, preferred area designation), and amenities (air conditioning, basement, hot water heating, guest room, parking spaces, and furnishing status).

How This Dataset Compares

When selecting a housing dataset for machine learning projects, practitioners often compare multiple options. Here's how this dataset positions against other popular choices:
This dataset's strength lies in its clean structure and balanced feature mix—13 features provide enough complexity for meaningful analysis without overwhelming beginners, while the zero missing values eliminate preprocessing hurdles.

Key Statistics and Market Insights

Property prices range from ₹1,750,000 to ₹13,300,000 with a median of ₹4,340,000 and mean of ₹4.77 million—a 7.6x spread indicating significant market segmentation.
View Source
SQL
SELECT MIN(price) AS min_price, MAX(price) AS max_price, AVG(price) AS avg_price, MEDIAN (price) AS median_price FROM house_price.csv
Data
Min PriceMax PriceAvg PriceMedian Price
1,750,00013,300,0004,766,729.254,340,000
1 row
Property areas span 1,650 to 16,200 square feet with an average of 5,151 sq ft—nearly 10x variation suitable for exploring non-linear price relationships.
View Source
SQL
SELECT MIN(area) AS min_area, MAX(area) AS max_area, AVG(area) AS avg_area FROM house_price.csv
Data
Min AreaMax AreaAvg Area
1,65016,2005,150.54
1 row

Price Drivers and Feature Analysis

View Source
SQL
SELECT bedrooms, COUNT(*) AS count, ROUND(AVG(price)) AS avg_price FROM house_price.csv GROUP BY bedrooms ORDER BY bedrooms
Data
BedroomsCountAvg Price (₹)
1 BR22,712,500
2 BR1363,632,022
3 BR3004,954,598
4 BR955,729,758
5 BR105,819,800
6 BR24,791,500
6 rows
3-bedroom homes dominate with 300 listings (55% of dataset), followed by 2-bedroom (136, 25%) and 4-bedroom (95, 17%) properties.
View Source
SQL
SELECT bedrooms, COUNT(*) AS count, ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS percentage FROM house_price.csv GROUP BY bedrooms ORDER BY count DESC
Data
BedroomsCountPercentage
330055
213625
49517.4
3 rows

Amenity Premium Analysis

Understanding which features command price premiums is essential for feature importance analysis and real-world valuation models:
View Source
SQL
Multiple queries comparing average prices WITH / WITHOUT EACH amenity
Data
FeatureWith Feature (₹)Without Feature (₹)Premium %
Air Conditioning6,013,2214,191,940+43%
Preferred Area5,879,0464,425,299+33%
Main Road Access4,991,7773,398,905+47%
Basement5,242,6154,509,966+16%
4 rows
Key Finding: Air conditioning delivers the highest price premium at 43% (₹6.01M vs ₹4.19M). Main road access shows the strongest absolute impact at 47% premium—critical insights for feature engineering in predictive models.

Variable Correlations

The area-price correlation is 0.54, indicating a moderate positive relationship—larger homes command higher prices, but other factors significantly influence valuation.
View Source
SQL
SELECT ROUND(CORR(area, price), 3) AS area_price_correlation FROM house_price.csv
Data
Area Price Correlation
0.54
1 row
This moderate correlation (rather than strong) makes the dataset valuable for demonstrating multicollinearity concepts and the importance of categorical features in regression models. The boolean amenity features often explain variance that area alone cannot capture.

Property Distribution Patterns

View Source
SQL
SELECT furnishingstatus, COUNT(*) AS count FROM house_price.csv GROUP BY furnishingstatus
Data
StatusCount
Semi-furnished227
Unfurnished178
Furnished140
3 rows
Furnished homes average ₹5.5 million versus ₹4.0 million for unfurnished—a 38% premium that demonstrates clear categorical feature impact on price.
View Source
SQL
SELECT furnishingstatus, ROUND(AVG(price)) AS avg_price FROM house_price.csv GROUP BY furnishingstatus
Data
FurnishingstatusAvg Price
furnished5,500,000
unfurnished4,000,000
2 rows
View Source
SQL
SELECT stories, COUNT(*) AS count, ROUND(AVG(price)) AS avg_price FROM house_price.csv GROUP BY stories ORDER BY stories
Data
StoriesCountAvg Price (₹)
1 Story2274,170,659
2 Stories2384,764,074
3 Stories395,685,436
4 Stories417,208,450
4 rows
Four-story properties command ₹7.2 million average—72% higher than single-story homes at ₹4.17 million. This non-linear relationship with stories makes the dataset excellent for polynomial feature exploration.
View Source
SQL
SELECT stories, ROUND(AVG(price)) AS avg_price FROM house_price.csv WHERE stories IN (1, 4) GROUP BY stories
Data
StoriesAvg Price
14,170,659
47,208,450
2 rows

Sample Records

Top 5 Most Expensive Properties
#Price (₹)Area (Sq Ft)BedroomsBathroomsStoriesACFurnishing
113,300,0007,420423YesFurnished
212,250,0008,960444YesFurnished
312,250,0009,960322NoSemi-furnished
412,215,0007,500422YesFurnished
511,410,0007,420412YesFurnished
5 rows
View Source
SQL
SELECT price, area, bedrooms, bathrooms, stories, airconditioning, furnishingstatus FROM house_price.csv ORDER BY price DESC LIMIT 5
Data
Price (₹)Area (Sq Ft)BedroomsBathroomsStoriesACFurnishing
13,300,0007,420423YesFurnished
12,250,0008,960444YesFurnished
12,250,0009,960322NoSemi-furnished
12,215,0007,500422YesFurnished
11,410,0007,420412YesFurnished
5 rows
Note that the highest-priced property (₹13.3M) is not the largest by area—it's the 7,420 sq ft furnished home with AC. This illustrates why multi-feature models outperform single-variable approaches.

Data Quality and Limitations

This dataset represents a cross-sectional snapshot of the Indian real estate market. Prices reflect historical data and should be used for educational and analytical purposes only—not for actual investment decisions.
When using this dataset, consider these characteristics:
  • Zero Missing Values: All 545 records are complete across all 13 columns—no imputation required
  • Sample Size: 545 records support traditional ML algorithms well; may be insufficient for deep learning without augmentation
  • Geographic Scope: Properties appear concentrated in Indian metros; findings may not generalize to other markets
  • No Temporal Data: Timestamps are absent, preventing time-series analysis or trend modeling
  • Feature Gaps: Common variables like property age, lot size, neighborhood details, and school district ratings are not included
Based on the dataset's structure and feature mix, these algorithms typically perform well:
Feature engineering opportunities include: creating price-per-square-foot ratios, binning continuous variables like area into categories, generating interaction terms between amenities, and one-hot encoding the furnishing status variable.

Table Overview

house_price

Contains 545 rows and 13 columns. Column types: 6 numeric, 1 text, 6 boolean.

545 rows13 columns

house_price

545
rows
13
columns

Data Preview

Scroll to see more
Row 1
price13,300,000
area7,420
bedrooms4
bathrooms2
stories3
+8 more columns
Row 2
price12,250,000
area8,960
bedrooms4
bathrooms4
stories4
+8 more columns
Row 3
price12,250,000
area9,960
bedrooms3
bathrooms2
stories2
+8 more columns

Data Profile

545
rows
13
columns
100%
complete
345.9 KB
estimated size

Column Types

6 Numeric1 Text6 Boolean

High-Cardinality Columns

Columns with many unique values (suitable for identifiers or categorical features)

  • area(284 unique values)

Data Dictionary

house_price

ColumnTypeExampleMissing Values
pricenumeric13300000, 122500000
areanumeric7420, 89600
bedroomsnumeric4, 40
bathroomsnumeric2, 40
storiesnumeric3, 40
mainroadbooleantrue, true0
guestroombooleanfalse, false0
basementbooleanfalse, false0
hotwaterheatingbooleanfalse, false0
airconditioningbooleantrue, true0
parkingnumeric2, 30
prefareabooleantrue, false0
furnishingstatusstring"furnished", "furnished"0
Last updated: January 2, 2026
Created: January 2, 2026