Statistics Assignment Help-171502

Assignment 1

(Refer Plots, Summary Output and Data section at end of the assignment)

1Refer to plot c and the seasonal exponential data for Sally-Stenography Source.

a. What type of model appropriate for the data? Explain.

b. Use that model, and discuss its efficacy. (How well does it perform?)

c. Offer any suggestions to improve the model performance.

2. The table below features three exponential smoothing models used on the same set of data.

 

Model 1

Model 2

Model 3

Type

Exponential

Trend

Trend with

 

Smoothing

 

Seasonality

MSE

8755.3

4876.2

5945.8

a. Based solely on the information in this output, what would you conclude about the underlying data set? Explain.

b. Are there any other possible explanations for the values in the table? Explain.

For Questions 3 to 9, refer to the attached regression output (refer Plots and Data section). Here we are using years of experience, college GPA, and company entrance score to predict employee salaries at a local firm.

3. We would conclude that the overall model (using all three explanatory variables) is statistically significant at the .05 level.

a. true

b. false

4. None of the explanatory variables are useful predictors

a. true

b. false

5. The most useful predictor in the presence of the other explanatory variables is

6. Multicollinearity is

 i Severe.

ii Mild.

(iii) Nonexistent.

8. Which of the following combinations would be expected to yield the highest pay? Values are years, GPA, and score.

  1. 5, 3.7, 80
  2. 7, 3.2, 85
  3. 8, 3.6, 85
  4. 5, 3.8, 85

8. Interpret the coefficient of determination for the overall (3-variable) model.

9. The model in Summary Output 2 uses only entrance score to predict salary.

a. Write the least squares regression equation.

b. Is this a useful model? Explain.

c. Is the output consistent with the other output? Explain.

10. Given the information, which answer is BEST?

 

  Model 1 Model 2 Model 3
X-variables

6

4

3

R2

.9344

.9277

.8761

Adjusted R2

.9058

.9133

.8497

MSE

5867.53

5746.09

5844.78

 

  1. Model 1 performs the best in all areas.
  2. Model 3 performs better than Model 2.
  1. We would most likely prefer Model 1.
  2. We would most likely prefer Model 2.
  3. We would most likely prefer Model 3.

11. Refer again to the Sally-Stenographer data (refer Plots and Data section, at the end) . Use exponential smoothing with a smoothing constant = 0.2. This model

  1. is effective
  2. is appropriate
  3. tends to under predict
  4. tends to over predict

12. In the previous problem, suppose we used a different smoothing constant value. This change would

  1. generate an appropriate model
  2. possibly generate a better model (with respect to prediction)
  3. tend to under predict
  4. tend to over predict

13. You are tracking mall sales over a two-year period.

  1. The data will surely contain a trend component.
  2. The data will likely contain a verifiable seasonal component.

14. In Question #13, suppose we track sales over a one-year period.

  1. The data will surely contain an irregular (random) component.
  2. The data will likely contain a verifiable seasonal component.

15. Discuss how a method on this assignment can be used to catch data errors such as data entry mistakes.

#####          Plots and Data section #####

image

image1

# SUMMARY OUTPUT 1        
Regression Statistics        
Multiple R

0.686507

       
R Square

0.4712919

       
Adjusted R          
Square

0.3655502

       
Standard Error

18.962716

       
Observations

19

       
ANOVA          
         

Significance

 

df

SS

MS

F

F

Regression

3

4808.020459

1602.673

4.457014

0.019857133

Residual

15

5393.769015

359.5846

   
Total

18

10201.78947

     
   

Standard

     
 

Coefficients

Error

t Stat

P-value

 
Intercept

-59.69295

38.48117376

-1.55122

0.141687

 
Years

-0.510186

1.100900645

-0.46343

0.649712

 
GPA

24.510667

12.31973076

1.989546

0.065194

 
Score

0.7600061

0.295486746

2.572048

0.021248

 

 

 

CORRELATION MATRIX

 

 

Salary

Years

GPA

Score

Salary

1

     
Years

0.192517816

1

   
GPA

0.484367567

0.280138

1

 
Score

0.575980821

0.341399

0.226652

1

 

 

 

 

 

# SUMMARY OUTPUT 2        
         
Regression Statistics        
Multiple R

0.5759808

       
R Square

0.3317539

       
Adjusted R          
Square

0.2924453

       
Standard Error

20.025434

       
Observations

19

       
ANOVA          
         

Significance

 

df

SS

MS

F

F

Regression

1

3384.483506

3384.484

8.43973

0.009855012

Residual

17

6817.305968

401.018

   
Total

18

10201.78947

     
           
   

Standard

     
 

Coefficients

Error

t Stat

P-value

 
Intercept

8.103183

18.48029361

0.438477

0.666562

 
Score

0.8430371

0.290189996

2.905121

0.009855

 

 

# Sally’s Data

 

Quarter       Sales

 

1            6455

 

2            8779

 

3         13897

 

4         18920

 

5         24225

 

6         26190

 

7         27440

 

8         37562

 

9         29895

 

10          29120

 

11          28540

 

12          39985

 

13          33255

 

14          32110

 

15          30875

 

16          41234

 

17          36476

 

18          34860

 

19          32197

 

20          43940

 

21          39723

 

22          37890

 

23          35230

 

24          46115

 

25          41432

 

26          39243

 

27          36922

 

28          49340

 

 

 

 

Q1

  1. A linear trend model is appropriate for the data as the points are monotonically spread along with the quarters
  2. The linear trend model fits the data with 70.5% R square, which means ~30% variation is not explained by the model
  3. A model of higher degree may be fit to the data to obtain better estimates

Q2

  1. Since the MSE is least in the trend model, the data looks to be monotonically increasing or decreasing.
  2. No there aren’t any other possible explanations apart from the fact the way the underlying data is either monotonically increasing or decreasing without many seasonalitiyes

Q3      True as the F values from the ANOVA table is less than 0.05

Q4      False, Score( p value = 0.02) is at 5%

Q5      Score as its p value is less than 5%

Q6      Mild as the standard error is not too high and the signs of parameter coefficients is as expected

Q7      5,3.8,85 (From the multiple linear regression equation, it yieds it the highest pay)

Q8      The coefficient is just 47% which means that only 47% variation is explained by the predictor variables

Q9.

  1. Salary= 8.103+0.843*Score
  2. Although the model is significant , but R square is just 33%. Thus it is not a very useful model
  3. Yes the output is consistent with respect to sign of the variable and its importance

Q10    d. We would most likely prefer model 2 as the adjusted R square is highest with least MSE for Model 2

Q11    b. is appropriate

Q12    a. generate an appropriate model because exponential smoothing with any value would yield an appropriate model for this data

Q13    b. The data will likely contain a verifiable seasonal component  because mall sales are generally high during some festival seasons and holiday seasons

Q14    a. The data will surely contain an irregular (random) component

Q15    Scatter plots will help to identify the outliers in the data which are not expected.         Those could be data entry errors