DATA ANALYSIS WITH SPSS

QUESTION

Assignment 3
Due Date: 17 October, 2011
Weighting: 25%
STA2300—Data Analysis 29
 This assignment is important in providing feedback and helping to establish
competency in essential skills.
 Answer all the questions. The questions are not of equal weight, and some
questions are worth much more than others.
 The questions relate to material up to and including Module 8.
 Read the Notes Concerning Assignments (page
12) before starting this Assign-
ment.
 When you are asked to comment on a nding, usually a short paragraph is all
that is required.
 For all graphs, label the axes correctly, include a contextual title and the units
of measurement.
 In many cases, spss output contains much more information than is required
for a correct and complete answer. In those cases just reproducing the output
may not attract any marks. Make sure you report only the information from the
spss output relevant to your answer.
 Unless instructed otherwise, show all working and formulae used in calculating
con dence intervals and performing hypothesis tests. (Answers may of course
be checked where possible using computer software).
 This assessment item consists of 4 questions.
Question 1 (30 marks)
Use the information in the dataset auto.sav to answer the following questions. For
parts (b) and (d), do the calculations by hand, using a calculator and the results from
part (a).
STA2300—Data Analysis 30
(a)
Use spss to calculate the mean and standard deviation of time to accelerate from 0
to 100km/hr of eight-cylinder cars from this sample. Select only those cars which
are eight-cylinders for this question.
(b)
Use the results in part (a) to estimate the mean time to accelerate from 0 to
100km/hr for eight-cylinder cars in general, using a 98% con dence interval (show
all working).
(c)
Justify the use of the con dence interval formula in part (b) by checking the appropriate
conditions and assumptions (include an appropriate graph to support your
answer).
(d)
Perform a hypothesis test to see if the true mean time to accelerate from 0 to
100km/hr for eight-cylinder cars is less 14 seconds. In performing this test, include:
(i)
State appropriate hypotheses (de ne any symbols used).
(ii)
State (but do not check) the assumptions for carrying out this test. Describe
the assumptions in the context of this question.
(iii)
Calculate a suitable test statistic for this test.
(iv)
Calculate the P-value of this test.
(v)
Write a meaningful conclusion at the 5% level of signi cance.
(e)
Check your answers for part (d) by nding the value of the test statistic and the
p-value using spss. [Include spss output in your answer.]
Question 2 (14 marks)
In a random sample of 550 Data Analysis students, 385 claimed to log onto the social
networking site Facebook at least ve times per week.
(a)
Estimate, with 95% con dence, the true population proportion of Data Analysis
students who log onto the social networking site Facebook at least ve times per
week.
(b)
Check the procedure you used in part (a) is appropriate by checking all the necessary
conditions and assumptions.
(c)
What is the minimum sample size required if we wish to estimate the population
proportion of Data Analysis students who log onto the social networking site Facebook
at least ve times per week, to within plus or minus 2%, with 95% con dence?
Use a conservative method in determining the sample size.
Question 3 (26 marks)
STA2300—Data Analysis 31
This question uses information from the data le auto.sav available under Assignments
with Data Sets on the Course StudyDesk.
A researcher is interested in the performance of American and Japanese cars, in particular,
do Japanese cars have a better fuel consumption than American cars?
(a)
Using spss and the sample information in auto.sav, determin the mean and
standard deviation of fuel consumption for American and Japanese cars.
(b)
Use an appropriate graph to compare the distribution of fuel consumption for
Japanese and American cars.
(c)
To answer the question `do Japanese cars have a better fuel consumption than
American cars?’ perform a hypothesis test by completing the following:
(i)
State appropriate hypothesis, clearly de ning all symbols.
(ii)
Check the assumptions for carrying out this test.
(iii)
Without using spss, calculate a suitable test statistic (you can use the results
from part (a) in this calculation.
(iv)
Without using spss, nd the P-value of the test.
(v)
Interpret the P-value and describe the outcome of the original question.
(vi)
Now use spss to check your results for this hypothesis test. Attach or copy
and paste the relevant output from spss for this test to your assignment.
(vii)
Brie
y comment on how the test statistic and P-value from spss output are
similar to or di er from your hand calculations.
(d)
Thoroughly investigate in 100 words or less, any issues you can see in conducting
this test. Include tables or diagrams if needed.
Question 4 (30 marks)
STA2300—Data Analysis 32
A tness clinic claims to be able to reduce a participant’s 3km run time in only 4
weeks, by participating in a series of intense training schedules. A random sample of
9 participants who completed the intense training schedules had their 3km run times
recorded before and after completing the schedule. Data on the run times can be seen
below (all times are recorded in minutes).
Participant Before After
1 15.8 13.2
2 12.1 12.2
3 21.6 17.8
4 18.8 12.6
5 29.3 15.6
6 20.7 16.4
7 18.9 13.3
8 9.2 9.2
9 10.7 10.5
Is there evidence that by participating in a series of intense training schedules, that the
average 3km run time can be reduced?
(a)
Use a parametric test to answer this question by completing the following:
(i)
State appropriate hypotheses (de ne any symbols used).
(ii)
State (but do not check) the assumptions for carrying out this test. Describe
the assumptions in the context of this question.
(iii)
Calculate a suitable test statistic for this test.
(iv)
Calculate the P-value of this test.
(v)
Interpret the P-value and describe the outcome of the test in the context of
this question.
(vi)
Check your results for this test by using spss to carry out the analysis. Copy
and paste your spss output to your assignment.
(b)
If the assumptions for the test in part (a) were not satis ed, what alternative test
could you perform? Perform this non-parametric test by completing the following:
(i)
State appropriate hypotheses.
(ii)
Calculate a suitable test statistic for this test.
(iii)
Calculate the P-value of this test.
STA2300—Data Analysis 33
(iv)
Interpret the P-value and describe the outcome of the test in the context of
this question.
(c)
Comment on any di erences found between the two tests and why this may be the
case.

SOLUTION

SOLUTION 1
a. For small business in the first year of operation, there are only two criteria that business
would survive
The criteria are Equity, (or Net worth of Firm); Type of Industry (Capital Intensive or
Service) and Debt-Service Coverage ratio
These criteria are Debt-Service Coverage ratio and Interest Coverage ratio, i.e. Cash Flow is
sufficient to meet interest payments and principal repayments. This debt will be primarily
for Working Capital requirements. Hence, equity infused becomes an important parameter
in the model.
The model used will be a probit model where the probability of the firm surviving in first
year of operation; or it could be a logistic regression where the dependent variable takes 0
or 1 and independent variable assumes the value as specified above, Equity & Debt-Service
Coverage ratio is a scale variable and Type of industry is a nominal variable. These will be
simulated under the assumption that probability of failure is 10%
The experiment considered here is a Bernoulli experiment where probability of failure as
given is 10% is used for answering the remaining question.

b. The probability that 2 or less will fail in a random sample of 12 is given by

=0.9
12
+12 *0.1
1
*0.9
11
+ 66*0.1
2
*0.9
= 0.8893

10
c. Mean number of small businesses expected to fail is 12*0.1=1.2

d. Mean number of small businesses expected to fail is 200*0.1=20

Std.dev of small businesses expected to fail is 200*0.1*0.9 =1.8

On an average 20 firms will fail with std dev of +- 1.8

e. The probability that 15 or less will fail in a random sample of 200 is given by 14.3%

f. The assumption that might be alarming here is that it is not always possible to have a
random stratified sampling for 200 firms such that each industry is properly
represented.
SOLUTION 2
a. The variable of interest is the speed of cars and unit of measurement is km/hour

b. The percentage of drivers that could get a speeding ticket (i.e. >110 km/h) is
Z=

==

Therefore P (Z>1.25)=1-normsdist(1.25) =1-0.8943=0.1056=10.56%
c. The speed below which these drivers are travelling is given by

105+normsinv (0.06)*4 = 105+(-1.55)*4 =98.78 km/h

SOLUTION 3
a. The contingency table from SPSS is as follows:-

Number of cylinders * Origin of Car Crosstabulation
Count

Origin of Car Total
America European Japanese
Number of cylinders 4 65 58 67 190
6 64 4 6 74
8 101 0 0 101
Total 230 62 73 365

b. The percentage of 4-cylinder cars that are Japanese is (67/190) = 35.26%

c. The percentage of cars that are 4-cylinder cars and Japanese  too is (67/365) = 18.35%

d. The number of cylinders and origin of cars are associated as the expected cars to be

independent of origin should have 40 cars each ; whereas 8-cylinder & 6 cylinder cars are
associated with America ; whereas 4-cylinder cars are not associated with any origin. The
same could be confirmed using Chi-Square Tests for independence.

a. The graph is as follows:-
SOLUTION 4

b. As evident from the distribution, the median will be on the left side of the mean and the
curve is slightly skewed towards right. i.e. the mean occurs after the median; hence most of
the cars on an usual take less time to accelerate to 100km/h than an average car in the
system. There are outliers on right side that take more than 25 sec and distribution is
slightly deviating from normal as tails have less mass than normal distribution. It’s a
platykurtic curve.

c. The mean is 16.55 sec and std dev is 2.37 sec

d. The median is 16.05sec and IQR is 3.225 sec

e. The best statistics used to measure the centre and spread is mean and variance as it also
defines a normal distribution; and the variable is an scale measure and hence normal
distribution is a good measure; yet trimmed normal distribution will be better as less
than 5 sec would not be feasible for car to accelerate to 100 km/h.
SOLUTION 5
a. Both the variables are scale variables and in particular are ratio variables too

b. The graph is as follows:-

c. The graph above shows that there exists an outlier in 1 car between the range of 10001500
kg
weight,
as
HP
is
more
than
any
other
vehicle
in
this
range.
Rest
of
them
follows

a
linear
relationship
with
constant
spread
for
HP across
the trend
in weight
of
vehicle.

d. Correlation between the two variables is
0.836
e. Regression equation is
HP = -21.839+0.092* Vehicle weight
All the test coefficients are significant.

f. As r-squared is 70% roughly, the model is able to explain 70% of variance in dependent
variable.
The engine power for 3000 kg vehicle weight is 254.16 HP

SOLUTION 6
a. The study was observational as none of the tests and control groups are formed and it is
based on deaths occurring as observed and not administered to a set of participants to
understand the effects of cause

b. The resposnse variable is premature death and the explanatory variables are whether
person is suffering from heart disease or not

c. The lurking variable is the geography i.e. the extraneous variable i.e. the surroundings as
these deaths are observed for a particular locality or region.

d. No, it is not possible to say that most of deaths are due to heart disease as we have to tests
for means differences whether it is significant between Ipswich, Brisbane and Australia; i.e.
Anova is required if there exists a distinct differences between groups

GF76

“The presented piece of writing is a good example how the academic paper should be written. However, the text can’t be used as a part of your own and submitted to your professor – it will be considered as plagiarism.

But you can order it from our service and receive complete high-quality custom paper.  Our service offers “Statistics” sample that was written by professional writer. If you like one, you have an opportunity to buy a similar paper. Any of the academic papers will be written from scratch, according to all customers’ specifications, expectations and highest standards.”

order-now-new     chat-new (1)