Assignment 1
(Refer Plots, Summary Output and Data section at end of the assignment)
1. Refer to plot c and the seasonal exponential data for SallyStenography Source.
a. What type of model appropriate for the data? Explain.
b. Use that model, and discuss its efficacy. (How well does it perform?)
c. Offer any suggestions to improve the model performance.
2. The table below features three exponential smoothing models used on the same set of data.
Model 1 
Model 2 
Model 3 

Type 
Exponential 
Trend 
Trend with 
Smoothing 
Seasonality 

MSE 
8755.3 
4876.2 
5945.8 
a. Based solely on the information in this output, what would you conclude about the underlying data set? Explain.
b. Are there any other possible explanations for the values in the table? Explain.
For Questions 3 to 9, refer to the attached regression output (refer Plots and Data section). Here we are using years of experience, college GPA, and company entrance score to predict employee salaries at a local firm.
3. We would conclude that the overall model (using all three explanatory variables) is statistically significant at the .05 level.
a. true
b. false
4. None of the explanatory variables are useful predictors
a. true
b. false
5. The most useful predictor in the presence of the other explanatory variables is
6. Multicollinearity is
i Severe.
ii Mild.
(iii) Nonexistent.
8. Which of the following combinations would be expected to yield the highest pay? Values are years, GPA, and score.
 5, 3.7, 80
 7, 3.2, 85
 8, 3.6, 85
 5, 3.8, 85
8. Interpret the coefficient of determination for the overall (3variable) model.
9. The model in Summary Output 2 uses only entrance score to predict salary.
a. Write the least squares regression equation.
b. Is this a useful model? Explain.
c. Is the output consistent with the other output? Explain.
10. Given the information, which answer is BEST?
Model 1  Model 2  Model 3  
Xvariables 
6 
4 
3 
R^{2} 
.9344 
.9277 
.8761 
Adjusted R^{2} 
.9058 
.9133 
.8497 
MSE 
5867.53 
5746.09 
5844.78 
 Model 1 performs the best in all areas.
 Model 3 performs better than Model 2.
 We would most likely prefer Model 1.
 We would most likely prefer Model 2.
 We would most likely prefer Model 3.
11. Refer again to the SallyStenographer data (refer Plots and Data section, at the end) . Use exponential smoothing with a smoothing constant = 0.2. This model
 is effective
 is appropriate
 tends to under predict
 tends to over predict
12. In the previous problem, suppose we used a different smoothing constant value. This change would
 generate an appropriate model
 possibly generate a better model (with respect to prediction)
 tend to under predict
 tend to over predict
13. You are tracking mall sales over a twoyear period.
 The data will surely contain a trend component.
 The data will likely contain a verifiable seasonal component.
14. In Question #13, suppose we track sales over a oneyear period.
 The data will surely contain an irregular (random) component.
 The data will likely contain a verifiable seasonal component.
15. Discuss how a method on this assignment can be used to catch data errors such as data entry mistakes.
##### Plots and Data section #####
# SUMMARY OUTPUT 1  
Regression Statistics  
Multiple R 
0.686507 

R Square 
0.4712919 

Adjusted R  
Square 
0.3655502 

Standard Error 
18.962716 

Observations 
19 

ANOVA  
Significance 

df 
SS 
MS 
F 
F 

Regression 
3 
4808.020459 
1602.673 
4.457014 
0.019857133 
Residual 
15 
5393.769015 
359.5846 

Total 
18 
10201.78947 

Standard 

Coefficients 
Error 
t Stat 
Pvalue 

Intercept 
59.69295 
38.48117376 
1.55122 
0.141687 

Years 
0.510186 
1.100900645 
0.46343 
0.649712 

GPA 
24.510667 
12.31973076 
1.989546 
0.065194 

Score 
0.7600061 
0.295486746 
2.572048 
0.021248 
CORRELATION MATRIX
Salary 
Years 
GPA 
Score 

Salary 
1 

Years 
0.192517816 
1 

GPA 
0.484367567 
0.280138 
1 

Score 
0.575980821 
0.341399 
0.226652 
1 
# SUMMARY OUTPUT 2  
Regression Statistics  
Multiple R 
0.5759808 

R Square 
0.3317539 

Adjusted R  
Square 
0.2924453 

Standard Error 
20.025434 

Observations 
19 

ANOVA  
Significance 

df 
SS 
MS 
F 
F 

Regression 
1 
3384.483506 
3384.484 
8.43973 
0.009855012 
Residual 
17 
6817.305968 
401.018 

Total 
18 
10201.78947 

Standard 

Coefficients 
Error 
t Stat 
Pvalue 

Intercept 
8.103183 
18.48029361 
0.438477 
0.666562 

Score 
0.8430371 
0.290189996 
2.905121 
0.009855 
# Sally’s Data
Quarter Sales
1 6455
2 8779
3 13897
4 18920
5 24225
6 26190
7 27440
8 37562
9 29895
10 29120
11 28540
12 39985
13 33255
14 32110
15 30875
16 41234
17 36476
18 34860
19 32197
20 43940
21 39723
22 37890
23 35230
24 46115
25 41432
26 39243
27 36922
28 49340
Q1
 A linear trend model is appropriate for the data as the points are monotonically spread along with the quarters
 The linear trend model fits the data with 70.5% R square, which means ~30% variation is not explained by the model
 A model of higher degree may be fit to the data to obtain better estimates
Q2
 Since the MSE is least in the trend model, the data looks to be monotonically increasing or decreasing.
 No there aren’t any other possible explanations apart from the fact the way the underlying data is either monotonically increasing or decreasing without many seasonalitiyes
Q3 True as the F values from the ANOVA table is less than 0.05
Q4 False, Score( p value = 0.02) is at 5%
Q5 Score as its p value is less than 5%
Q6 Mild as the standard error is not too high and the signs of parameter coefficients is as expected
Q7 5,3.8,85 (From the multiple linear regression equation, it yieds it the highest pay)
Q8 The coefficient is just 47% which means that only 47% variation is explained by the predictor variables
Q9.
 Salary= 8.103+0.843*Score
 Although the model is significant , but R square is just 33%. Thus it is not a very useful model
 Yes the output is consistent with respect to sign of the variable and its importance
Q10 d. We would most likely prefer model 2 as the adjusted R square is highest with least MSE for Model 2
Q11 b. is appropriate
Q12 a. generate an appropriate model because exponential smoothing with any value would yield an appropriate model for this data
Q13 b. The data will likely contain a verifiable seasonal component because mall sales are generally high during some festival seasons and holiday seasons
Q14 a. The data will surely contain an irregular (random) component
Q15 Scatter plots will help to identify the outliers in the data which are not expected. Those could be data entry errors