Assignment 1
(Refer Plots, Summary Output and Data section at end of the assignment)
1. Refer to plot c and the seasonal exponential data for Sally-Stenography Source.
a. What type of model appropriate for the data? Explain.
b. Use that model, and discuss its efficacy. (How well does it perform?)
c. Offer any suggestions to improve the model performance.
2. The table below features three exponential smoothing models used on the same set of data.
Model 1 |
Model 2 |
Model 3 |
|
Type |
Exponential |
Trend |
Trend with |
Smoothing |
Seasonality |
||
MSE |
8755.3 |
4876.2 |
5945.8 |
a. Based solely on the information in this output, what would you conclude about the underlying data set? Explain.
b. Are there any other possible explanations for the values in the table? Explain.
For Questions 3 to 9, refer to the attached regression output (refer Plots and Data section). Here we are using years of experience, college GPA, and company entrance score to predict employee salaries at a local firm.
3. We would conclude that the overall model (using all three explanatory variables) is statistically significant at the .05 level.
a. true
b. false
4. None of the explanatory variables are useful predictors
a. true
b. false
5. The most useful predictor in the presence of the other explanatory variables is
6. Multicollinearity is
i Severe.
ii Mild.
(iii) Nonexistent.
8. Which of the following combinations would be expected to yield the highest pay? Values are years, GPA, and score.
- 5, 3.7, 80
- 7, 3.2, 85
- 8, 3.6, 85
- 5, 3.8, 85
8. Interpret the coefficient of determination for the overall (3-variable) model.
9. The model in Summary Output 2 uses only entrance score to predict salary.
a. Write the least squares regression equation.
b. Is this a useful model? Explain.
c. Is the output consistent with the other output? Explain.
10. Given the information, which answer is BEST?
Model 1 | Model 2 | Model 3 | |
X-variables |
6 |
4 |
3 |
R2 |
.9344 |
.9277 |
.8761 |
Adjusted R2 |
.9058 |
.9133 |
.8497 |
MSE |
5867.53 |
5746.09 |
5844.78 |
- Model 1 performs the best in all areas.
- Model 3 performs better than Model 2.
- We would most likely prefer Model 1.
- We would most likely prefer Model 2.
- We would most likely prefer Model 3.
11. Refer again to the Sally-Stenographer data (refer Plots and Data section, at the end) . Use exponential smoothing with a smoothing constant = 0.2. This model
- is effective
- is appropriate
- tends to under predict
- tends to over predict
12. In the previous problem, suppose we used a different smoothing constant value. This change would
- generate an appropriate model
- possibly generate a better model (with respect to prediction)
- tend to under predict
- tend to over predict
13. You are tracking mall sales over a two-year period.
- The data will surely contain a trend component.
- The data will likely contain a verifiable seasonal component.
14. In Question #13, suppose we track sales over a one-year period.
- The data will surely contain an irregular (random) component.
- The data will likely contain a verifiable seasonal component.
15. Discuss how a method on this assignment can be used to catch data errors such as data entry mistakes.
##### Plots and Data section #####
# SUMMARY OUTPUT 1 | |||||
Regression Statistics | |||||
Multiple R |
0.686507 |
||||
R Square |
0.4712919 |
||||
Adjusted R | |||||
Square |
0.3655502 |
||||
Standard Error |
18.962716 |
||||
Observations |
19 |
||||
ANOVA | |||||
Significance |
|||||
df |
SS |
MS |
F |
F |
|
Regression |
3 |
4808.020459 |
1602.673 |
4.457014 |
0.019857133 |
Residual |
15 |
5393.769015 |
359.5846 |
||
Total |
18 |
10201.78947 |
|||
Standard |
|||||
Coefficients |
Error |
t Stat |
P-value |
||
Intercept |
-59.69295 |
38.48117376 |
-1.55122 |
0.141687 |
|
Years |
-0.510186 |
1.100900645 |
-0.46343 |
0.649712 |
|
GPA |
24.510667 |
12.31973076 |
1.989546 |
0.065194 |
|
Score |
0.7600061 |
0.295486746 |
2.572048 |
0.021248 |
CORRELATION MATRIX
Salary |
Years |
GPA |
Score |
|
Salary |
1 |
|||
Years |
0.192517816 |
1 |
||
GPA |
0.484367567 |
0.280138 |
1 |
|
Score |
0.575980821 |
0.341399 |
0.226652 |
1 |
# SUMMARY OUTPUT 2 | |||||
Regression Statistics | |||||
Multiple R |
0.5759808 |
||||
R Square |
0.3317539 |
||||
Adjusted R | |||||
Square |
0.2924453 |
||||
Standard Error |
20.025434 |
||||
Observations |
19 |
||||
ANOVA | |||||
Significance |
|||||
df |
SS |
MS |
F |
F |
|
Regression |
1 |
3384.483506 |
3384.484 |
8.43973 |
0.009855012 |
Residual |
17 |
6817.305968 |
401.018 |
||
Total |
18 |
10201.78947 |
|||
Standard |
|||||
Coefficients |
Error |
t Stat |
P-value |
||
Intercept |
8.103183 |
18.48029361 |
0.438477 |
0.666562 |
|
Score |
0.8430371 |
0.290189996 |
2.905121 |
0.009855 |
# Sally’s Data
Quarter Sales
1 6455
2 8779
3 13897
4 18920
5 24225
6 26190
7 27440
8 37562
9 29895
10 29120
11 28540
12 39985
13 33255
14 32110
15 30875
16 41234
17 36476
18 34860
19 32197
20 43940
21 39723
22 37890
23 35230
24 46115
25 41432
26 39243
27 36922
28 49340
Q1
- A linear trend model is appropriate for the data as the points are monotonically spread along with the quarters
- The linear trend model fits the data with 70.5% R square, which means ~30% variation is not explained by the model
- A model of higher degree may be fit to the data to obtain better estimates
Q2
- Since the MSE is least in the trend model, the data looks to be monotonically increasing or decreasing.
- No there aren’t any other possible explanations apart from the fact the way the underlying data is either monotonically increasing or decreasing without many seasonalitiyes
Q3 True as the F values from the ANOVA table is less than 0.05
Q4 False, Score( p value = 0.02) is at 5%
Q5 Score as its p value is less than 5%
Q6 Mild as the standard error is not too high and the signs of parameter coefficients is as expected
Q7 5,3.8,85 (From the multiple linear regression equation, it yieds it the highest pay)
Q8 The coefficient is just 47% which means that only 47% variation is explained by the predictor variables
Q9.
- Salary= 8.103+0.843*Score
- Although the model is significant , but R square is just 33%. Thus it is not a very useful model
- Yes the output is consistent with respect to sign of the variable and its importance
Q10 d. We would most likely prefer model 2 as the adjusted R square is highest with least MSE for Model 2
Q11 b. is appropriate
Q12 a. generate an appropriate model because exponential smoothing with any value would yield an appropriate model for this data
Q13 b. The data will likely contain a verifiable seasonal component because mall sales are generally high during some festival seasons and holiday seasons
Q14 a. The data will surely contain an irregular (random) component
Q15 Scatter plots will help to identify the outliers in the data which are not expected. Those could be data entry errors