Question;All hypothesis tests should include hypotheses, test statistic, p-value or critical value, decision,and conclusion.A policy analyst for the Ministry of Education wanted to determine whether what relationshipsbetween income and the aggregate level of education might be used to encourage students to stayin school. Although there were potential problems with interpreting relationships based onaggregate data, she decided to begin with newly released data from the 2011 National HouseholdSurvey.She collected and manipulated the following data for the census tracts in the Ottawa area:CTname:P_hsgrad:P_trades:P_collcert:P_univdipl:P_univdegr:Pop15+:MedInc:AvgInc:MedInc*:identifying code for the census tractthe proportion of adults who graduated from high schoolthe proportion of adults with qualifications in a tradethe proportion of adults with a college certificatethe proportion of adults with a university diploma (no degree)the proportion of adults with a university degreethe number of adults with employment incomethe median employment income for persons above 15 yearsthe average employment income for persons above 15 yearsthe median employment income, with missing valuesThe data are in the files OttGatNHS.mtw and OttGatNHS.xls.Please try to minimize the listing of computer output or the excessive use of appendices inreporting your results. Summarize the results of each regression by displaying the regressionequation, the coefficients and their standard errors, as well as the usual summary statistics such asR-square and R-square(adj).(a) Plot the average incomes and the median incomes, either separately or together. Whatone word describes the shape of these income distributions?(b) Perform a multiple regression analysis using the five educational variables as predictorvariables and the median income (MedInc) as the response variable.(c) For the regression model in (b), graph the standardized residuals against the fitted valuesand comment on whether the linear regression model assumptions are warranted.(d) The MedInc* variable copies the data from the MedInc variable, but a missing valuecode has been inserted for six census tracts. Examine the MedInc* data and describe thenature of these six census tracts (hint: look at the standardized residual values for theunusual observations.)(e) Since these six census tracts are eliminated from further regression modeling, whatlimitations do subsequent models have?The remaining questions pertain to regression models based on the MedInc* variable and not theoriginal MedInc variable.(f) Re-estimate the multiple regression model using MedInc* as the new response variable.For this regression, save the standardized residuals and the fitted values, calculate theVariance Inflation Factors (click the Options button), and plot the residuals against thefitted values. Are there any particular problems with multicollinearity?(g) What changes do you notice, comparing this model with the previous model?(h) Examine the standardized residuals using a histogram, a boxplot or a normal probabilityplot. What do you conclude?(i) Plot the standardized residuals against the fits. Do you see any other problems with themodel assumptions?(j) Calculate the correlation coefficient between the fitted values and the MedInc* variable?Show the relationship between this correlation coefficient and the value of R2.(k) Perform an F-test for the overall usefulness of the model, using the 1% level ofsignificance. What do you conclude?(l) Using the model developed in part (f), test the marginal usefulness or importance of theP_univdegr variable, given the other variables in the model, using the 1% level ofsignificance.(m) If the proportion of university graduates were to increase by 0.1 in a set of census tracts(that is, from 0.1 to 0.2 or from 0.3 to.4, as the case may be), assuming the otherpredictor variables remain constant, what is the estimated average increase in the medianincome for these census tracts? (Give an estimate using a 99% confidence level.) Wouldyou conclude that a university degree is beneficial in terms of increasing aggregateincomes?(n) Plot MedInc* against the P_univdegr variable and find the estimated slope of theregression line. Is the coefficient of the P_univdegr variable in the simple regressionmodel consistent with the slope estimate in the multiple regression model? Explainbriefly why they might differ.(o) Finally use the model developed in part (f) to calculate a 99% prediction interval for theactual median income in census tract 110.00. (Click the Options button in the studentversion of Minitab, and copy and paste the values of the predictor variables for thiscensus tract, making sure there are only spaces between the numerals. For Minitab 17,select Predict under Regression) Show manually how the standard error for theprediction interval is calculated using the standard error for the confidence interval andthe standard error of the regression estimate.(p) Explain why you would not expect the prediction interval to cover the actual medianincome for this census tract.
Paper#61527 | Written in 18-Jul-2015Price : $47