Home appraisals are expensive and people would rather enter the square footage of their home to a website than have someone poking around with a measuring tape. This is where Bayesian linear regression can assist us. Housing data is a good candidate for linear regression as the relationship between certain factors such as square feet and the mean price is linear.
For my analysis I used the King County Housing Dataset available on Kaggle. The dataset includes quantitative details of the house such as square feet, number of bedrooms, number of bathrooms and zip code. As I am performing Bayesian linear regression, the predicted variable, price, is drawn from a normal distribution characterized by the mean and variance. I used standard non informative prior distributions for all of my parameters and fit the model on over two thousand rows of data, which was sufficiently high to make the likelihood independent of the priors.
In my project I have demonstrated the application of Bayesian linear regression to predicting housing prices. To check the validity of my model I attempt to predict the appraisal price on an example home. A recent appraisal on this example home was 1.9E+6 so the average of the models predictions ended up being a good approximation for the actual price of the home. However, the deviance of the model was quite high suggesting the proposed model is a bad fit for the data. If I were to continue working on this I would assign informative priors to certain parameters given some knowledge of their distribution.
model{
for( i in 1:n ) {
price[i] ~ dnorm(mu[i], tau)
mu[i] <- alpha + beta[1] * date[i] + beta[2] * bedrooms[i] + beta[3] * bathrooms[i] + beta[4] * sqft_living[i] + beta[5] * sqft_lot[i] + beta[6] * floors[i] + beta[7] * waterfront[i] + beta[8] * view[i] + beta[9] * condition[i] + beta[10] * grade[i] + beta[11] * sqft_above[i] + beta[12] * sqft_basement[i] + beta[13] * yr_built[i] + beta[14] * yr_renovated[i] + beta[15] * zipcode[i] + beta[16] * lat[i] + beta[17] * long[i] + beta[18] * sqft_living15[i] + beta[19] * sqft_lot15[i]
}
#Priors
alpha ~ dnorm(0, 0.0001)
for (j in 1:20){
beta[j] ~ dnorm(0, 0.0001)
}
tau ~ dgamma(0.01, 0.01)
#Prediction
new.date <-12062020
new.bedrooms <-5
new.bathrooms <-4
new.sqft_living <-2402
new.sqft_lot<-11325
new.floors<-2
new.waterfront<-0
new.view<-1
new.condition<-4
new.grade<-8
new.sqft_above<-1402
new.sqft_basement<-1000
new.yr_built<-1947
new.yr_renovated<2010
new.zipcode<-98177
new.lat<-47.706
new.long<- -122.379
new.sqft_living<-2402
new.sqft_lot15<-11325
pricempred<- alpha + beta[1] * new.date + beta[2] * new.bedrooms + beta[3] * new.bathrooms + beta[4] * new.sqft_living + beta[5] * new.sqft_lot + beta[6] * new.floors + beta[7] * new.waterfront + beta[8] * new.view + beta[9] * new.condition + beta[10] * new.grade + beta[11] * new.sqft_above + beta[12] * new.sqft_basement + beta[13] * new.yr_built + beta[14] * new.yr_renovated + beta[15] * new.zipcode + beta[16] * new.lat + beta[17] * new.long + beta[18] * new.sqft_living15 + beta[19] * new.sqft_lot15
price_pred ~ dnorm(pricempred, tau)
}
#Data
list(n=21613)
#Inits
list(beta=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), tau=1,price_pred=600000)



KC Housing Data - https://www.kaggle.com/harlfoxem/housesalesprediction