MAS284: Applied Statistics and Process Management
External Assignment 4
Internal assignment 3
Due: Wednesday, May 23.
Semester 1, 2012
Where appropriate, you may use MINITAB/EXCEL or any suitable statistical package; not
all questions will need the use of a statistical package. Should you use a statistical package,
you should cut and paste the relevant part of the output next to your answer. If you hand
in an unedited output, it should be annotated and be placed immediately before or after your
comments/answers to the question; outputs placed as ‘appendices’ or not properly identiﬁed
may not be marked.
The data for question 1, 2 and 4 can be found in the MINITAB/EXCEL ﬁle
a4 2012.MTW/a4 2012.xlxs in LMS.
1.
(14 marks)
Monthly loan applications for a local bank over a three years period were as follows: (read
across from left to right):
20 22 26 30 36 35 40 41 36 32 34 22
19 19 22 30 31 30 36 39 37 36 36 32
29 30 28 32 38 37 39 44 48 47 33 28
(a) Graph the time series and identify any distinctive features of the plot.
(b) (i) Fit a linear trend line to the data.
(ii) Is the trend line signiﬁcant? Justify.
(iii) Explain brieﬂy the reasons for the signiﬁcance or nonsigniﬁcance of the trend
line and the small R
2
value?
(c) An AR(3) model is ﬁtted to the loan data with the results given below. Based on
these results, would it be reasonable to ﬁt a smaller order autoregressive model?
Justify.
Final Estimates of Parameters
Type Coef SE Coef T P
AR 1 1.3677 0.1738 7.87 0.000
AR 2 0.4707 0.2996 1.57 0.126
AR 3 0.1007 0.2094 0.48 0.634
Number of observations: 36
Residuals: SS = 697.749 (backforecasts excluded)
1
MS = 21.144 DF = 33
2.
(12 marks)
In a random sample of 33 junior executives, each is classiﬁed by potential for promotion
as good, poor, or uncertain. Each is given a test to measure his or her level of anxiety.
The coded results (the higher the value the higher the level of anxiety) are as follows:
Good 4 5 4 3 2 5 3 2 5 4 4 4 3
Poor 4 8 7 5 5 4 9 9 6 7
Uncertain 5 4 7 4 7 5 4 5 3 5
What are your conclusions about this study? Write a brief report which should include
the appropriate hypotheses and an evaluation of assumptions.
3. (12 marks)
Suppose that a golf association wants to compare the mean distances travelled by four
diﬀerent brands (A, B, C and D) of golf balls when struck with a driver. A randomised
block design is employed utilising a random sample of 8 golfers, with each golfer using a
driver to hit four balls one from each brand in a random order. The distance travelled is
to be recorded for each hit.
(a)
Why is a randomised block design employed and why is randomisation necessary?
(b) The experiment was carried out as designed but the results were erroneously analysed
as if the experiment had been completely randomized (i.e. no blocking) with the oneway
ANOVA output given below. Based on this output, is there evidence that the
mean distances associated with the brands diﬀer?
(c)
Suppose the sum of squares due to the golfers is 4264. Set up the ANOVA table that
will incorporate this information. Is the conclusion the same as that in (b)?
Analysis of Variance for dist
Source DF SS MS F P
brand 3 2652 884 2.51 0.079
Error 28 9832 351
Total 31 12484
2
4. (12 marks)
The temperature and pressure used in molding a certain plastic aﬀect its tensile strength.
The following table show the tensile strength (coded) of specimens of plastic molded at 3
diﬀerent temperatures (coded 1,2,3) and under 3 diﬀerent pressures (coded A,B,C).
pressure
temperature A B C
10 10 8
1 11 10 8
12 9 8
12 8 9
8 12 9
2 9 12 8
9 11 9
8 11 9
8 9 10
3 9 9 10
9 10 11
9 9 11
(a) Calculate the means for the four observations in each temperaturepressure group.
Plot the means of the nine groups on a graph with tensile strength on the y axis and
temperature on the x axis. For each pressure, connect the three points corresponding
to the diﬀerent temperatures. What does the plot tell you regarding interaction and
main eﬀects? [ Note that you can alternatively plot the nine points on a strength vs
pressure graph and for each temperature, connect the three points corresponding to
the diﬀerent pressures.]
(b)
Run a twoway ANOVA on these data. Summarize the results of the signiﬁcance
tests.
3
SOLUTION
1.
a) The time series for all the three years together is as follows:
Fig. 1 – No. of monthly loan applications for three consecutive years
Shown below is the graph with three plots – one for each year.
Fig. 2 – No. of monthly loan applications for three consecutive years arranged year wise
As can be seen from the graphs, the number of monthly loan applications seems to be fluctuating.
They seem to be generally increasing in the period from January to AugustSeptember, and then, in the first and second years, show a steady decline until December.
The number of monthly loan installments also seems to be increasing every year.
b)
The equation of the linear trend line is y = 0.301x + 27.02. The Rsquared value is 0.185. This means that this is an extremely poor fit for the data. Hence, the trend line for this particular set of data is insignificant (a poor model) and does not give a fairly accurate idea of the relationship between the variables.
The reason for this insignificance is that a linear relationship between the variables is assumed while in fact the relationship between them is very far from it. The relationship between the two variables is not accurately explained by a linear fit for the data.
2.
A graph showing the levels of anxiety for each type of employee (classified by potential of promotion) is shown below.
As an assumption for this analysis, the order of the employees is kept the same as that given in the data as it will not make any difference to the analysis, since it is the levels of anxiety (Yaxis) that we are concerned about. Hence, the order of points on the Xaxis (employees) will not matter in the final analysis[1].
Fig. 3 – Level of Anxiety for employees with different potential for promotion
From the graph, it can be seen that employees with a Good promotion potential are the least anxious. The level of anxiety, on average, increases as the potential for promotion decreases.
A single factor ANOVA table is shown below (see “2.xlsx”):
Source of Variation 
SS 
df 
MS 
F 
Pvalue 
F crit 
Between Groups 
42.20047 
2 
21.10023 
10.08467 
0.000447 
3.31583 
Within Groups 
62.76923 
30 
2.092308 

Total 
104.9697 
32 
Table 1 – A single factor ANOVA for the given data
The pvalue is negligible. Hence, the null hypothesis can be rejected. We can conclude that there is a correlation between the level of anxiety and the promotion prospects of an employee. The relationship between these two factors for an employee, on average, is not coincidental.
3.
a) A randomized block design is necessary to eliminate the effects of the nuisance factor in the calculations. By randomizing the order in which the balls are shot, the brand order is randomized, which is not of primary concern. The only variation that we are concerned with is the distance travelled by each brand, and not the order in which the balls are hit. The factor of interest is the brand (i.e. the distance travelled by the balls belonging to each one)
b) The pvalue is 0.079, which is higher (though only slightly) than the standard cutoff threshold value of 0.05. Hence, the evidence is (barely) insufficient to conclude that the mean distance of at least one brand differs from the others. Although, it must be noted that the pvalue in itself is not a definite measure of the accuracy of the hypothesis, it can be used as an approximate measure for comparison.
c) See the file “3.xlsx”, which contains the required formulas in the cells as well. The table is as shown below:
Source 
DF 
SS 
MS 
F 
P 
brand 
3 
4264 
1421.33333 
4.04773529 
0.01648339[2] 
error 
28 
9832 
351.142857 


Total 
31 
14096 



Table 2 – The ANOVA table for the data
For the given Fvalue, the pvalue is 0.0165. This is much below the threshold of 0.05. Hence, there is sufficient evidence to conclude that the mean distance of at least one brand differs from the others. This result is different from the one obtained in b).
4.
a) The Pivot table in “4.xlsx” gives the averages/means of the four observations for the different pressures for each temperature. The table is reproduced below:
Temperature  Average of A  Average of B  Average of C 
1  11.25  9.25  8.25 
2  8.5  11.5  8.75 
3  8.75  9.25  10.5 
Grand Total  9.5  10  9.166666667 
The plot of tensile strength vs. temperature for different pressures is shown below:
Fig. 4 – Tensile strength vs. Temperature for different pressures
From the plot, it is clear that
 For Pressure A, the tensile strength is highest for temperature 1, and then decreases for temperature 2 and 3. The tensile strength of the plastic for temperatures 2 and 3 is approximately the same.
 For pressure B, the tensile strength for temperatures 1 and 3 is the same (at 9.25), but is higher (11.5) for temperature 2.
 For pressure C, the tensile strength goes on increasing as the temperature changes from 1 to 3.
Hence, we can conclude that the interaction is very low. The effect of temperature and pressure on the tensile strength and their relationship varies widely for different plastics.
b) The two way ANOVA table for the three temperatures is shown below:
ANOVA  
Source of Variation  SS  df  MS  F  Pvalue  F crit 
Sample  0.055556  2  0.027778  0.065217  0.937011  3.354131 
Columns  4.222222  2  2.111111  4.956522  0.014672  3.354131 
Interaction  43.11111  4  10.77778  25.30435  8.56E09  2.727765 
Within  11.5  27  0.425926  
Total  58.88889  35 
The pvalue of the interaction is very small, almost approaching zero. Hence, the null hypothesis is rejected. The relationship between temperature and pressure and their effects on tensile strength of the material is not the same for different materials.
The conclusion for the different pressures is also the same. It can be concluded from the analysis that the pressure has different effects on the plastic materials.
On the other hand, the pvalue of the sample is greater than 0.05, hence the null hypothesis can be accepted.
References:
Freedman David, Robert Pisani and Roger Purves. Statistics. 4^{th} ed. New York: W. W. Norton & Company, 2007.
Freedman, David. Statistical models: Theory and practice. Cambridge: Cambridge University Press, 2005.
Pace, Larry. Statistical Analysis using Excel 2007. New York: Prentice Hall, 2010.
Speigel. Murray and Larry Stephens. Schaums Outline of Statistics. 4^{th} ed. New York: McGrawHill, 2011.
[1] An alternate way of doing this analysis is to draw a Pivot Table in Excel. However, this method is shorter and simpler and can provide more or less the same conclusion.
[2] Is calculated using the FDIST function in Excel. See “3.xlsx”. In this case, P = FDIST(F,3,28)
LE50
But you can order it from our service and receive complete highquality custom paper. Our service offers Statistics essay sample that was written by professional writer. If you like one, you have an opportunity to buy a similar paper. Any of the academic papers will be written from scratch, according to all customers’ specifications, expectations and highest standards.”