QUESTION

Assignment 3

Due Date: 17 October, 2011

Weighting: 25%

STA2300—Data Analysis 29

This assignment is important in providing feedback and helping to establish

competency in essential skills.

Answer all the questions. The questions are not of equal weight, and some

questions are worth much more than others.

The questions relate to material up to and including Module 8.

Read the Notes Concerning Assignments (page

12) before starting this Assign-

ment.

When you are asked to comment on a nding, usually a short paragraph is all

that is required.

For all graphs, label the axes correctly, include a contextual title and the units

of measurement.

In many cases, spss output contains much more information than is required

for a correct and complete answer. In those cases just reproducing the output

may not attract any marks. Make sure you report only the information from the

spss output relevant to your answer.

Unless instructed otherwise, show all working and formulae used in calculating

condence intervals and performing hypothesis tests. (Answers may of course

be checked where possible using computer software).

This assessment item consists of 4 questions.

Question 1 (30 marks)

Use the information in the dataset auto.sav to answer the following questions. For

parts (b) and (d), do the calculations by hand, using a calculator and the results from

part (a).

STA2300—Data Analysis 30

(a)

Use spss to calculate the mean and standard deviation of time to accelerate from 0

to 100km/hr of eight-cylinder cars from this sample. Select only those cars which

are eight-cylinders for this question.

(b)

Use the results in part (a) to estimate the mean time to accelerate from 0 to

100km/hr for eight-cylinder cars in general, using a 98% condence interval (show

all working).

(c)

Justify the use of the condence interval formula in part (b) by checking the appropriate

conditions and assumptions (include an appropriate graph to support your

answer).

(d)

Perform a hypothesis test to see if the true mean time to accelerate from 0 to

100km/hr for eight-cylinder cars is less 14 seconds. In performing this test, include:

(i)

State appropriate hypotheses (dene any symbols used).

(ii)

State (but do not check) the assumptions for carrying out this test. Describe

the assumptions in the context of this question.

(iii)

Calculate a suitable test statistic for this test.

(iv)

Calculate the P-value of this test.

(v)

Write a meaningful conclusion at the 5% level of signicance.

(e)

Check your answers for part (d) by nding the value of the test statistic and the

p-value using spss. [Include spss output in your answer.]

Question 2 (14 marks)

In a random sample of 550 Data Analysis students, 385 claimed to log onto the social

networking site Facebook at least ve times per week.

(a)

Estimate, with 95% condence, the true population proportion of Data Analysis

students who log onto the social networking site Facebook at least ve times per

week.

(b)

Check the procedure you used in part (a) is appropriate by checking all the necessary

conditions and assumptions.

(c)

What is the minimum sample size required if we wish to estimate the population

proportion of Data Analysis students who log onto the social networking site Facebook

at least ve times per week, to within plus or minus 2%, with 95% condence?

Use a conservative method in determining the sample size.

Question 3 (26 marks)

STA2300—Data Analysis 31

This question uses information from the data le auto.sav available under Assignments

with Data Sets on the Course StudyDesk.

A researcher is interested in the performance of American and Japanese cars, in particular,

do Japanese cars have a better fuel consumption than American cars?

(a)

Using spss and the sample information in auto.sav, determin the mean and

standard deviation of fuel consumption for American and Japanese cars.

(b)

Use an appropriate graph to compare the distribution of fuel consumption for

Japanese and American cars.

(c)

To answer the question `do Japanese cars have a better fuel consumption than

American cars?’ perform a hypothesis test by completing the following:

(i)

State appropriate hypothesis, clearly dening all symbols.

(ii)

Check the assumptions for carrying out this test.

(iii)

Without using spss, calculate a suitable test statistic (you can use the results

from part (a) in this calculation.

(iv)

Without using spss, nd the P-value of the test.

(v)

Interpret the P-value and describe the outcome of the original question.

(vi)

Now use spss to check your results for this hypothesis test. Attach or copy

and paste the relevant output from spss for this test to your assignment.

(vii)

Brie

y comment on how the test statistic and P-value from spss output are

similar to or dier from your hand calculations.

(d)

Thoroughly investigate in 100 words or less, any issues you can see in conducting

this test. Include tables or diagrams if needed.

Question 4 (30 marks)

STA2300—Data Analysis 32

A tness clinic claims to be able to reduce a participant’s 3km run time in only 4

weeks, by participating in a series of intense training schedules. A random sample of

9 participants who completed the intense training schedules had their 3km run times

recorded before and after completing the schedule. Data on the run times can be seen

below (all times are recorded in minutes).

Participant Before After

1 15.8 13.2

2 12.1 12.2

3 21.6 17.8

4 18.8 12.6

5 29.3 15.6

6 20.7 16.4

7 18.9 13.3

8 9.2 9.2

9 10.7 10.5

Is there evidence that by participating in a series of intense training schedules, that the

average 3km run time can be reduced?

(a)

Use a parametric test to answer this question by completing the following:

(i)

State appropriate hypotheses (dene any symbols used).

(ii)

State (but do not check) the assumptions for carrying out this test. Describe

the assumptions in the context of this question.

(iii)

Calculate a suitable test statistic for this test.

(iv)

Calculate the P-value of this test.

(v)

Interpret the P-value and describe the outcome of the test in the context of

this question.

(vi)

Check your results for this test by using spss to carry out the analysis. Copy

and paste your spss output to your assignment.

(b)

If the assumptions for the test in part (a) were not satised, what alternative test

could you perform? Perform this non-parametric test by completing the following:

(i)

State appropriate hypotheses.

(ii)

Calculate a suitable test statistic for this test.

(iii)

Calculate the P-value of this test.

STA2300—Data Analysis 33

(iv)

Interpret the P-value and describe the outcome of the test in the context of

this question.

(c)

Comment on any dierences found between the two tests and why this may be the

case.

SOLUTION

SOLUTION 1

a. For small business in the first year of operation, there are only two criteria that business

would survive

The criteria are Equity, (or Net worth of Firm); Type of Industry (Capital Intensive or

Service) and Debt-Service Coverage ratio

These criteria are Debt-Service Coverage ratio and Interest Coverage ratio, i.e. Cash Flow is

sufficient to meet interest payments and principal repayments. This debt will be primarily

for Working Capital requirements. Hence, equity infused becomes an important parameter

in the model.

The model used will be a probit model where the probability of the firm surviving in first

year of operation; or it could be a logistic regression where the dependent variable takes 0

or 1 and independent variable assumes the value as specified above, Equity & Debt-Service

Coverage ratio is a scale variable and Type of industry is a nominal variable. These will be

simulated under the assumption that probability of failure is 10%

The experiment considered here is a Bernoulli experiment where probability of failure as

given is 10% is used for answering the remaining question.

b. The probability that 2 or less will fail in a random sample of 12 is given by

=0.9

12

+12 *0.1

1

*0.9

11

+ 66*0.1

2

*0.9

= 0.8893

10

c. Mean number of small businesses expected to fail is 12*0.1=1.2

d. Mean number of small businesses expected to fail is 200*0.1=20

Std.dev of small businesses expected to fail is 200*0.1*0.9 =1.8

On an average 20 firms will fail with std dev of +- 1.8

e. The probability that 15 or less will fail in a random sample of 200 is given by 14.3%

f. The assumption that might be alarming here is that it is not always possible to have a

random stratified sampling for 200 firms such that each industry is properly

represented.

SOLUTION 2

a. The variable of interest is the speed of cars and unit of measurement is km/hour

b. The percentage of drivers that could get a speeding ticket (i.e. >110 km/h) is

Z=

==

Therefore P (Z>1.25)=1-normsdist(1.25) =1-0.8943=0.1056=10.56%

c. The speed below which these drivers are travelling is given by

105+normsinv (0.06)*4 = 105+(-1.55)*4 =98.78 km/h

SOLUTION 3

a. The contingency table from SPSS is as follows:-

Number of cylinders * Origin of Car Crosstabulation

Count

Origin of Car Total

America European Japanese

Number of cylinders 4 65 58 67 190

6 64 4 6 74

8 101 0 0 101

Total 230 62 73 365

b. The percentage of 4-cylinder cars that are Japanese is (67/190) = 35.26%

c. The percentage of cars that are 4-cylinder cars and Japanese too is (67/365) = 18.35%

d. The number of cylinders and origin of cars are associated as the expected cars to be

independent of origin should have 40 cars each ; whereas 8-cylinder & 6 cylinder cars are

associated with America ; whereas 4-cylinder cars are not associated with any origin. The

same could be confirmed using Chi-Square Tests for independence.

a. The graph is as follows:-

SOLUTION 4

b. As evident from the distribution, the median will be on the left side of the mean and the

curve is slightly skewed towards right. i.e. the mean occurs after the median; hence most of

the cars on an usual take less time to accelerate to 100km/h than an average car in the

system. There are outliers on right side that take more than 25 sec and distribution is

slightly deviating from normal as tails have less mass than normal distribution. It’s a

platykurtic curve.

c. The mean is 16.55 sec and std dev is 2.37 sec

d. The median is 16.05sec and IQR is 3.225 sec

e. The best statistics used to measure the centre and spread is mean and variance as it also

defines a normal distribution; and the variable is an scale measure and hence normal

distribution is a good measure; yet trimmed normal distribution will be better as less

than 5 sec would not be feasible for car to accelerate to 100 km/h.

SOLUTION 5

a. Both the variables are scale variables and in particular are ratio variables too

b. The graph is as follows:-

c. The graph above shows that there exists an outlier in 1 car between the range of 10001500

kg

weight,

as

HP

is

more

than

any

other

vehicle

in

this

range.

Rest

of

them

follows

a

linear

relationship

with

constant

spread

for

HP across

the trend

in weight

of

vehicle.

d. Correlation between the two variables is

0.836

e. Regression equation is

HP = -21.839+0.092* Vehicle weight

All the test coefficients are significant.

f. As r-squared is 70% roughly, the model is able to explain 70% of variance in dependent

variable.

The engine power for 3000 kg vehicle weight is 254.16 HP

SOLUTION 6

a. The study was observational as none of the tests and control groups are formed and it is

based on deaths occurring as observed and not administered to a set of participants to

understand the effects of cause

b. The resposnse variable is premature death and the explanatory variables are whether

person is suffering from heart disease or not

c. The lurking variable is the geography i.e. the extraneous variable i.e. the surroundings as

these deaths are observed for a particular locality or region.

d. No, it is not possible to say that most of deaths are due to heart disease as we have to tests

for means differences whether it is significant between Ipswich, Brisbane and Australia; i.e.

Anova is required if there exists a distinct differences between groups

GF76

But you can order it from our service and receive complete high-quality custom paper. Our service offers “Statistics” sample that was written by professional writer. If you like one, you have an opportunity to buy a similar paper. Any of the academic papers will be written from scratch, according to all customers’ specifications, expectations and highest standards.”