Detetminantes of Housing Price: An Exploratory and Statistical Analysis-2391398

Task A – Data Acquisition and Exploratory Data Analysis

Data Acquisition

The dataset was obtained in Kaggle: Housing Price Prediction Dataset. It has 545 observations of houses, and the variables (price, area, bedrooms, parking) are numerical and categorical variables (furnishing status, access to main road, preferred area). Such a combination enables descriptive and inferential statistical analysis. The data can be used to comprehend the role of housing characteristics on the price of houses and it is relevant to real-life housing and urban planning decision-making.

Research Question

What are the structural and locational characteristics that the most severely affect house prices?

The statistical analysis and data visualization will be guided by this question with the main areas of concern: area, bedrooms, parking, main road access, furnishing status and preferred area.

Summary Statistics

The dataset contains 545 houses. The prices will be between 1.75 million and 13.3 million with the mean price of about 4.77 milliom (SD = 1.87 million). The size of houses ranges between 1,650 sq. ft and 16, 200 sq. ft with a median of 5,151 sq. ft. This implies that there is a good variety of housing stock with regard to affordability and property size.

NMinimumMaximumMeanStd. DeviationVariance
Price of House5451750000133000004766729.251870439.6163498544355820.582
Area Coverage5451650162005150.542170.1414709512.058

Frequency Distributions

  • Bedrooms: Most houses have 3 bedrooms (55%), followed by 2 bedrooms (24.95%), while larger homes (5–6 bedrooms) are rare (<3%).
  • Main Road Access: A significant majority (85.87%) of properties are connected to the main road, highlighting accessibility as a common feature.
  • Furnishing Status: The largest category is semi-furnished (41.65%), followed by furnished (25.69%) and unfurnished (32.66%).
  • Parking: Over half the houses (54.86%) lack parking, 23.12% has a single parking lots, 19.82% has 2 parking slots and only 2.2% offer space for three vehicles.

TASK B – STATISTICAL ANALYSIS

Test 1: Correlation Analysis (Area vs. House Price)

  • Hypotheses: There is a significant positive relationship between area and house price.
  • Method: Pearson correlation (both variables are continuous).
  • Assumptions: Linear relationship, normally distributed residuals.

Results:

Correlations
 Price of HouseArea Coverage
Price of HousePearson Correlation1.536**
Sig. (2-tailed) .000
N545545
Area CoveragePearson Correlation.536**1
Sig. (2-tailed).000 
N545545

The Pearson correlation test was used to test the correlation between the house price and the area coverage. The findings were the existence of a moderate positive correlation (r = 0.536, p < 0.01) between the size of a house and its price, which means that the larger the house is, the more likely its price is to increase. This correlation is found to be significant and thus it is valid to state that big houses tend to fetch more money. This means that the hypothesis (H1) which states that area is a significant positive determinant of price is accepted.

Test 2: Independent Samples t-test (Main Road Access vs. House Price)

  • Hypotheses: Houses with main road access have significantly different prices.
  • Method: Independent t-test (group = main road access, dependent = price).
  • Assumptions: Normality, homogeneity of variance.

Results:

T-Test

Group Statistics
 Accesability to Main RoadNMeanStd. DeviationStd. Error Mean
Price of Houseyes4684991777.331893639.11387533.499
no773398904.55894735.465101964.569
Independent Samples Test
 Levene’s Test for Equality of Variancest-test for Equality of Means
FSig.tdfSig. (2-tailed)Mean DifferenceStd. Error Difference95% Confidence Interval of the Difference
LowerUpper
Price of HouseEqual variances assumed30.624.0007.245543.0001592872.784219854.4181161003.4292024742.138
Equal variances not assumed  11.853210.676.0001592872.784134383.3581327964.4631857781.105
Independent Samples Effect Sizes
 StandardizeraPoint Estimate95% Confidence Interval
LowerUpper
Price of HouseCohen’s d1787743.612.891.6441.137
Hedges’ correction1790217.618.890.6431.136
Glass’s delta894735.4651.7801.4062.149
a. The denominator used in estimating the effect sizes. Cohen’s d uses the pooled standard deviation. Hedges’ correction uses the pooled standard deviation, plus a correction factor. Glass’s delta uses the sample standard deviation of the control group.

The independent t-test was used to compare the prices of houses depending on the access to the main road. An average price of houses with access to the roads (4,991,777.33) was higher than the unaccessible ones (3,398,904.55). The gap between the two figures amounted to 1,592,872.78 which was also statistically insignificant (t = 7.245, p < 0.01) and the effect size was quite large (Cohen d = 0.891). This demonstrates that accessibility has a strong effect on price. Therefore, the null hypothesis (H2) is accepted: houses which have access to the main road are much more expensive.

Test 3: One-Way ANOVA (Furnishing Status vs. House Price)

  • Hypotheses: furnishing category has a significantly different mean price.
  • Method: One-Way ANOVA (independent = furnishing status, dependent = price).
  • Assumptions: Normality, homogeneity of variance.

Results:

Oneway

Descriptives
Price of Houses  
 NMeanStd. DeviationStd. Error95% Confidence Interval for MeanMinimumMaximum
Lower BoundUpper Bound
furnished1415496435.742110297.458177719.1065145075.535847795.96175000013300000
semi-furnished2274907524.231596687.757105975.8894698697.025116351.44176715012250000
unfurnished1774004870.061720955.307129354.9223749583.674260156.44175000010150000
Total5454766729.251870439.61680120.8304609345.154924113.35175000013300000
Tests of Homogeneity of Variances
 Levene Statisticdf1df2Sig.
Price of HousesBased on Mean8.3372542.000
Based on Median7.2882542.001
Based on Median and with adjusted df7.2882530.937.001
Based on trimmed mean8.1212542.000
ANOVA
Price of Houses  
 Sum of SquaresdfMean SquareFSig.
Between Groups182314372711905.120291157186355952.56028.710.000
Within Groups1720893756854486.0005423175080732203.849  
Total1903208129566391.000544   

Post Hoc Tests

Multiple Comparisons
Dependent Variable:   Price of Houses  
Tukey HSD  
(I) furnishingstatus(J) furnishingstatusMean Difference (I-J)Std. ErrorSig.95% Confidence Interval
Lower BoundUpper Bound
furnishedsemi-furnished588911.516*191063.976.006139879.151037943.88
unfurnished1491565.688*201138.291.0001018857.001964274.38
semi-furnishedfurnished-588911.516*191063.976.006-1037943.88-139879.15
unfurnished902654.173*178676.940.000482733.421322574.93
unfurnishedfurnished-1491565.688*201138.291.000-1964274.38-1018857.00
semi-furnished-902654.173*178676.940.000-1322574.93-482733.42
*. The mean difference is significant at the 0.05 level.

Homogeneous Subsets

Price of Houses
Tukey HSDa,b  
furnishingstatusNSubset for alpha = 0.05
123
unfurnished1774004870.06  
semi-furnished227 4907524.23 
furnished141  5496435.74
Sig. 1.0001.0001.000
Means for groups in homogeneous subsets are displayed.
a. Uses Harmonic Mean Sample Size = 174.956.
b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels are not guaranteed.

The ANOVA used was a one-way test, which studied the differences in prices of houses in various categories of furnishing. The findings showed a big difference in the mean prices (F (2, 542) = 28.71, p = 0.01). The average price of furnished houses was the highest (5,496,435.74), semi-furnished (4,907,524.23) and unfurnished houses came next (4,004,870.06). Post-hoc Tukey tests were done to ensure that the three groups are significantly different (p < 0.05). This was a significant impact, which demonstrated that the status of furnishing has a significant impact on property value. Thus, hypothesis (H3) is accepted: furnishing categories differ in terms of house prices significantly.

Interpretation Across Tests

The three statistical tests together highlight the key drivers of house price variation in the dataset.

  1. Correlation (Area vs. Price): The Pearson correlation showed a moderate positive relationship (r = 0.536, p < 0.01), meaning larger houses tend to have higher prices. This indicates that area coverage is a strong structural factor in determining property value.
  2. T-test (Main Road Access vs. Price): Results revealed a significant difference in mean prices between houses with road access (M = 4.9M) and those without (M = 4.0M), with a large effect size (Cohen’s d ≈ 0.89). This suggests that location and accessibility are crucial in price determination, as buyers are willing to pay more for well-connected properties.
  3. ANOVA (Furnishing Status vs. Price): The ANOVA confirmed that furnishing levels significantly impact house prices. Furnished houses had the highest mean price (≈ 5.5M), followed by semi-furnished (≈ 4.9M) and unfurnished (≈ 4.0M). Post-hoc comparisons showed that all groups differ significantly (p < 0.05). This underlines how value-added amenities and presentation enhance property worth.

DATA VISUALIZATIONS

  • Max Price by Bedrooms: Shows the highest house prices observed for each bedroom count. Houses with 4 bedrooms recorded the highest maximum prices.
  • Average Price by Bedrooms: Illustrates how mean prices rise with additional bedrooms, peaking around 4–5 bedrooms.
  • Min Price by Bedrooms: Highlights the lowest prices in each bedroom category. Interestingly, 6-bedroom houses show higher minimum prices compared to smaller houses.
  • Average Price Based on Main Road Proximity: Demonstrates that houses with main road access have higher average prices than those without.
  • Average Price by Furnishing Status: Furnished houses have the highest average prices, followed by semi-furnished, with unfurnished being the lowest.
  • Average Price by Number of Parkings: Indicates that more parking spaces are linked to higher house prices, with 2–3 parking slots showing the highest averages.

CONCLUSION

This research examined the determinants of the prices of houses through three statistical methods and the findings of the study combinedly indicate the combined effect of size, location, and furnishing to the determination of the house prices. Firstly, correlation analysis has depicted a moderate positive correlation between the area coverage and the price; thus, it has been proved that bigger houses tend to be priced higher. Second, the independent t-test supported the hypothesis that houses with main road access cost considerably more than houses without access to it, which serves to highlight the value of the location and its accessibility as a factor that buyers consider when purchasing a house. Lastly, the one-way ANOVA showed a great difference in prices of the categories of furnished things and the highest price was that of furnished homes, then semi furnished, and then the unfurnished. The combined results confirm all the three assumptions, as they prove that the trend in the housing market is the result of the interaction of the structural features, the accessibility of the context, and the quality of furnishings. This information is useful to homeowners, investors and policymakers who would like to maximize the value of their property and make decisions that help them to make informed choices on the housing market.