The "Boston Housing" data set, part of the MASS package




Question;The ?Boston Housing" data set, part of the MASS package, records properties of 506 housingzones in the Greater Boston area. For a description of the data, see Moodle2 (housing data andattribute information). Typically one is interested in predicting MEDV (median home value)based on other attributes.Try to fit an MLR to this dataset, with MEDV as the dependent variable. MEDV has asomewhat longish tail and is not so Gaussian-like, so we will take a log transform, (use LMEDV= - log(MEDV)), and then predict LMDEV instead. (You should convince yourself that this is abetter idea by looking at the histograms and quantile plots to assess normality, however no needto submit such plots). Keep the firrst 300 records as a training set (call it Bostrain) which youwill use to fit the model, the remaining 206 will be used as a test set (Bostest). Use only thefollowing variables in your model:LMEDV = LSTAT + RM + CRIM + ZN + CHAS.(a) Report the coefficients obtained by your model. Would you drop any of the variables used inyour model (based on the t-scores or p-values)?(b) Report the MSE obtained on Bostrain. How much does this increase when you score yourmodel on Bostest?


