# Statistics Assignment Help-171502

Assignment 1

(Refer Plots, Summary Output and Data section at end of the assignment)

1Refer to plot c and the seasonal exponential data for Sally-Stenography Source.

a. What type of model appropriate for the data? Explain.

b. Use that model, and discuss its efficacy. (How well does it perform?)

c. Offer any suggestions to improve the model performance.

2. The table below features three exponential smoothing models used on the same set of data.

 Model 1 Model 2 Model 3 Type Exponential Trend Trend with Smoothing Seasonality MSE 8755.3 4876.2 5945.8

a. Based solely on the information in this output, what would you conclude about the underlying data set? Explain.

b. Are there any other possible explanations for the values in the table? Explain.

For Questions 3 to 9, refer to the attached regression output (refer Plots and Data section). Here we are using years of experience, college GPA, and company entrance score to predict employee salaries at a local firm.

3. We would conclude that the overall model (using all three explanatory variables) is statistically significant at the .05 level.

a. true

b. false

4. None of the explanatory variables are useful predictors

a. true

b. false

5. The most useful predictor in the presence of the other explanatory variables is

6. Multicollinearity is

i Severe.

ii Mild.

(iii) Nonexistent.

8. Which of the following combinations would be expected to yield the highest pay? Values are years, GPA, and score.

1. 5, 3.7, 80
2. 7, 3.2, 85
3. 8, 3.6, 85
4. 5, 3.8, 85

8. Interpret the coefficient of determination for the overall (3-variable) model.

9. The model in Summary Output 2 uses only entrance score to predict salary.

a. Write the least squares regression equation.

b. Is this a useful model? Explain.

c. Is the output consistent with the other output? Explain.

10. Given the information, which answer is BEST?

 Model 1 Model 2 Model 3 X-variables 6 4 3 R2 .9344 .9277 .8761 Adjusted R2 .9058 .9133 .8497 MSE 5867.53 5746.09 5844.78

1. Model 1 performs the best in all areas.
2. Model 3 performs better than Model 2.
1. We would most likely prefer Model 1.
2. We would most likely prefer Model 2.
3. We would most likely prefer Model 3.

11. Refer again to the Sally-Stenographer data (refer Plots and Data section, at the end) . Use exponential smoothing with a smoothing constant = 0.2. This model

1. is effective
2. is appropriate
3. tends to under predict
4. tends to over predict

12. In the previous problem, suppose we used a different smoothing constant value. This change would

1. generate an appropriate model
2. possibly generate a better model (with respect to prediction)
3. tend to under predict
4. tend to over predict

13. You are tracking mall sales over a two-year period.

1. The data will surely contain a trend component.
2. The data will likely contain a verifiable seasonal component.

14. In Question #13, suppose we track sales over a one-year period.

1. The data will surely contain an irregular (random) component.
2. The data will likely contain a verifiable seasonal component.

15. Discuss how a method on this assignment can be used to catch data errors such as data entry mistakes.

#####          Plots and Data section #####  # SUMMARY OUTPUT 1 Regression Statistics Multiple R 0.686507 R Square 0.4712919 Adjusted R Square 0.3655502 Standard Error 18.962716 Observations 19 ANOVA Significance df SS MS F F Regression 3 4808.020459 1602.673 4.457014 0.019857133 Residual 15 5393.769015 359.5846 Total 18 10201.78947 Standard Coefficients Error t Stat P-value Intercept -59.69295 38.48117376 -1.55122 0.141687 Years -0.510186 1.100900645 -0.46343 0.649712 GPA 24.510667 12.31973076 1.989546 0.065194 Score 0.7600061 0.295486746 2.572048 0.021248

CORRELATION MATRIX

 Salary Years GPA Score Salary 1 Years 0.192517816 1 GPA 0.484367567 0.280138 1 Score 0.575980821 0.341399 0.226652 1

 # SUMMARY OUTPUT 2 Regression Statistics Multiple R 0.5759808 R Square 0.3317539 Adjusted R Square 0.2924453 Standard Error 20.025434 Observations 19 ANOVA Significance df SS MS F F Regression 1 3384.483506 3384.484 8.43973 0.009855012 Residual 17 6817.305968 401.018 Total 18 10201.78947 Standard Coefficients Error t Stat P-value Intercept 8.103183 18.48029361 0.438477 0.666562 Score 0.8430371 0.290189996 2.905121 0.009855

# Sally’s Data

Quarter       Sales

1            6455

2            8779

3         13897

4         18920

5         24225

6         26190

7         27440

8         37562

9         29895

10          29120

11          28540

12          39985

13          33255

14          32110

15          30875

16          41234

17          36476

18          34860

19          32197

20          43940

21          39723

22          37890

23          35230

24          46115

25          41432

26          39243

27          36922

28          49340

Q1

1. A linear trend model is appropriate for the data as the points are monotonically spread along with the quarters
2. The linear trend model fits the data with 70.5% R square, which means ~30% variation is not explained by the model
3. A model of higher degree may be fit to the data to obtain better estimates

Q2

1. Since the MSE is least in the trend model, the data looks to be monotonically increasing or decreasing.
2. No there aren’t any other possible explanations apart from the fact the way the underlying data is either monotonically increasing or decreasing without many seasonalitiyes

Q3      True as the F values from the ANOVA table is less than 0.05

Q4      False, Score( p value = 0.02) is at 5%

Q5      Score as its p value is less than 5%

Q6      Mild as the standard error is not too high and the signs of parameter coefficients is as expected

Q7      5,3.8,85 (From the multiple linear regression equation, it yieds it the highest pay)

Q8      The coefficient is just 47% which means that only 47% variation is explained by the predictor variables

Q9.

1. Salary= 8.103+0.843*Score
2. Although the model is significant , but R square is just 33%. Thus it is not a very useful model
3. Yes the output is consistent with respect to sign of the variable and its importance

Q10    d. We would most likely prefer model 2 as the adjusted R square is highest with least MSE for Model 2

Q11    b. is appropriate

Q12    a. generate an appropriate model because exponential smoothing with any value would yield an appropriate model for this data

Q13    b. The data will likely contain a verifiable seasonal component  because mall sales are generally high during some festival seasons and holiday seasons

Q14    a. The data will surely contain an irregular (random) component

Q15    Scatter plots will help to identify the outliers in the data which are not expected.         Those could be data entry errors 