#### Description of this paper

##### stats and probability-Data Analysis #5 ST314- Cost of developing Software Data...

**Description**

solution

**Question**

Question;1) (5 points) Cost of developing Software Data. The three basic structural elements of data processing system are files, flows, and processes. Files are collections of permanent records in the system, flows are data interfaces between the system and the environment, and processes are functionally defined logical manipulations of the data. An investigation of the cost of developing software as related to files, flows, and processes was conducted. The ANOVA table below is the partial output from and the multiple linear regression analysis.Source Degrees of freedom Sum of Squares Mean Squares F p-valRegression 3 <0.0001Residual 3945.1 Total 24 18023.2 (1 point) Fill out the rest of the ANOVA table based on the provided information. (1 point) Use the table to calculate the coefficient of determination, R2. Interpret. (3 points) Perform a model utility test (f test) from the ANOVA table. Significance level = 0.01. State the Null and Alternative Hypotheses. State the F statistic along with the numerator and denominator degrees of freedom and p-value. What can you conclude from the test?2) (17 points) Multiple Linear Regression:The electric power (Power) consumed each month at a chemical plant is under study. It is assumed that the monthly electric consumption is related to four factors: average ambient temperature?(Temp), number of days in the month (Days), average product purity in percent (Purity), and the volume of production in tons (Volume). Use the data and code provided under the Data Sets and Code tab to do the following. (3 points) Provide and describe a scatterplot matrix. Are there any relationships between the response variable and the predictors?Are there anyvisible relationships between the predictors? (3 points) Fit a least squares regression model using all variables. From the overall F test (model utility test)is there evidence that at least one variable is a significant predictor of power consumption? Assume a significance level of 0.05. Include R summary output. (2 points) How well does the model fit the data? In other words, what is R^2? (2 points) Give the estimated least squares regression equation. (2 points) Interpret the estimated slope of temperature. (2 points) Provide a power prediction in kW for when Temp = 50?, Days = 25, Purity = 90%, and Volume = 100 Tons. (1 point) In part e) the observed power for the above predictor values is 288 kW. How far off is the predicted value from the observed value? (2 Points) Plot the residuals from the model. Are conditions satisfied? Briefly describe the plot. 3) (28 points) Model Selection: In the power consumption example, not all predictors are significant/ needed in the model. When choosing a model we want to take the simplest model possible, therefore dropping variables that do not show evidence of any predictive power is common practice.(See Beers vs BAC code for help on all parts)(2 points) Using the output from 2b), perform individual t tests on the slopes. Which individual variables are significant at the 0.05 significance level? (4 points) Fit a new model after removing the explanatory variable with the largest non-significant p-value (only one at a time). Continue this process until you have all significant predictors. What is your new model? Provide the R output for the model summary. (Hint: Should reduce to only one explanatory variable)(5 points) Interpret the slope of your model, include a 95% confidence interval for the slope, and interpret. (2 points) Plot the residuals from the model. Are conditions satisfied? Briefly describe the plot. (2 points) How well does the model fit the data? In other words, what is R^2? How does this compare to 1c? (2 points) Provide a power prediction in kW for when Temp = 50?, Days = 25, Purity = 90%, or Volume = 100Tons. (Choose the one that applies to your model.) (1 point) In part f) the observed power is 288 kW. How far off is the predicted value from the observed value? (5 points) Provide a scatterplot with the best fit line, confidence interval and prediction interval bands. (5 points) Calculate the confidence interval and prediction interval for the predicted power consumption given the same value from part g). Interpret both.Attachment Preview:Name__________________________Data Analysis #5ST314Total: 50 pointsDue: Tuesday, December 2nd, at 11:59pm PSTPlease Download, complete and upload as PDF or Word Document.No other format will be accepted.Typing or entering answers by hand is accepted as long as solutions are neatly given and document is uploadedas PDF.Give the solutions in the space provided. Material from Week 8 and 9Course Notes, Chapters12-13arecovered on this analysis.1JagerName__________________________1) (5 points) Cost of developing Software Data. The three basic structural elements of data processingsystem are files, flows, and processes. Files are collections of permanent records in the system, flows are datainterfaces between the system and the environment, and processes are functionally defined logicalmanipulations of the data. An investigation of the cost of developing software as related to files, flows, andprocesses was conducted. The ANOVA table below is the partial output from and the multiple linear regressionanalysis.SourceRegressionDegrees offreedom3a)b)c)i.ii.iii.Mean SquaresFp-val<0.00013945.1ResidualTotalSum ofSquares2418023.2(1 point) Fill out the rest of the ANOVA table based on the provided information.(1 point) Use the table to calculate the coefficient of determination, R2. Interpret.(3 points) Perform a model utility test (f test) from the ANOVA table. Significance level = 0.01.State the Null and Alternative Hypotheses.State the F statistic along with the numerator and denominator degrees of freedom and p-value.What can you conclude from the test?2) (17 points) Multiple Linear Regression:The electric power (Power) consumed each month at a chemicalplant is under study. It is assumed that the monthly electric consumption is related to four factors: averageambient temperature (Temp), number of days in the month (Days), average product purity in percent (Purity),and the volume of production in tons (Volume). Use the data and code provided under the Data Sets and Codetab to do the following.a) (3 points) Provide and describe a scatterplot matrix. Are there any relationships between the responseb)c)d)e)2variable and the predictors?Are there anyvisible relationships between the predictors?(3 points) Fit a least squares regression model using all variables. From the overall F test (model utilitytest)is there evidence that at least one variable is a significant predictor of power consumption? Assumea significance level of 0.05. Include R summary output.(2 points) How well does the model fit the data? In other words, what is?(2 points) Give the estimated least squares regression equation.(2 points) Interpret the estimated slope of temperature.JagerName__________________________(2 points) Provide a power prediction in kW for when Temp = 50, Days = 25, Purity = 90%, andVolume = 100 Tons.g) (1 point) In part e) the observed power for the above predictor values is 288 kW. How far off is thepredicted value from the observed value?h) (2 Points) Plot the residuals from the model. Are conditions satisfied? Briefly describe the plot.f)3) (28 points) Model Selection: In the power consumption example, not all predictors are significant/ needed inthe model. When choosing a model we want to take the simplest model possible, therefore dropping variablesthat do not show evidence of any predictive power is common practice.(See Beers vs BAC code for help on all parts)a) (2 points) Using the output from 2b), perform individual t tests on the slopes. Which individual variablesb)c)d)e)f)g)h)i)3are significant at the 0.05 significance level?(4 points) Fit a new model after removing the explanatory variable with the largest non-significant pvalue (only one at a time). Continue this process until you have all significant predictors. What is yournew model? Provide the R output for the model summary. (Hint: Should reduce to only one explanatoryvariable)(5 points) Interpret the slope of your model, include a 95% confidence interval for the slope, andinterpret.(2 points) Plot the residuals from the model. Are conditions satisfied? Briefly describe the plot.(2 points) How well does the model fit the data? In other words, what is? How does this compare to1c?(2 points) Provide a power prediction in kW for when Temp = 50, Days = 25, Purity = 90%, or Volume= 100Tons. (Choose the one that applies to your model.)(1 point) In part f) the observed power is 288 kW. How far off is the predicted value from the observedvalue?(5 points) Provide a scatterplot with the best fit line, confidence interval and prediction interval bands.(5 points) Calculate the confidence interval and prediction interval for the predicted power consumptiongiven the same value from part g). Interpret both.

Paper#61265 | Written in 18-Jul-2015

Price :*$37*