STAT131 Understanding Variation & Uncertainty Assessment 3 Version 1 Probability & Discrete Random Variables-72855

Instructions

  • Complete your test in this document at home in your own time
  • Do not confer with other students – you may post questions to the forum.
  • You ay respond to questions in the forum with reference to appropriate materials
  • Save with your name as the file name, Surname, Firstname, StudentNumber Ass3V1 & upload to ELearning.
  • Format MUST be .doc or .docx
  • Download your Labtest data from ELEarning.
  • In all instances where you are asked to provide output, use the paste special command and paste your output as a picture.
  • Submit by 10pm Week  8 Saturday 2nd May, 2015. Once the first of these tests has been marked ie solutions effectively released, no further tests can be marked. Marking where possible will be done in the order of submission.

The DATA

  • CIRCUITS_2015.SAV This data set has the outcomes for testing 140 circuit boards each with six chips. The data is the number of chips working out of the set of six on the circuit boards.

 

Task 1:

 

  1. A test application has been developed to detect whether attachments contains malware or not. T+ is used to indicate that the test for malware is positive, T- indicates that the test is negative ie it does not detect malware.  The application is tested with a sample set of 498 attachments which have malware (M) and attachments which do not (~M)

 

Email Attachments Reveals

 

Test

Malware (M) No Malware

(~M-)

 

Total
Test Malware (T+) 220     8 228
Test NoMalware  (~T)   11 259 270
Total 231 267 498

 

Your background search for information involves identifying from the “health” literature the definition of predictive value positive as distinct from predictive value negative (Book of Learning Resources has a video clip on this). Predictive Value Positive of a test is the probability the person or email has the disease or malware (M) given that the test is positive for the presence of disease or malware (T+). That is Predictive Value Positive =P(M|T+)

Submit only the following pages

Assessment 3 V1 Probability and Discrete random variables

Name:

Student Number:

Mark:

1 70% is 21/30 marks for this section

Leave answers as fractions as derived from the table without simplification

a) Reading from the table specify the predictive value Positive  P(M | T+) of the test.

Answer: (220/498)/(231/498) = 220/231 = 0.9524

3

b) The Predictive Value Negative of a test may also be written in symbols in a form similar to a). Write Predictive Value negative as a conditional probability. (Do a search for meaning of the term)

Answer: Negative P(-M|T+)

3

c) Using your answer to b) read from the table the Predictive Value Negative of the test.

Answer:.8

3

d) What is the probability that a randomly selected attachment tests negative for malware?

Answer: 270/498 = 0.5422

3

e) What is the probability that a randomly selected attachment does not have malware?

Answer: 267/498 = 0.5361

3

f) What is the probability that a randomly selected email both tests positive and has malware in the email.

Answer: 220/498 = 0.4418

3

g) Would you expect the outcomes of the test to be independent of whether or not the attachment has malware. Explain.

Answer: Outcomes of the test are not independent because there is a common mails with a malware and positive or negative test.

 

3

h) Produce a stacked bar chart for these data. What does it suggest about the independence of the test and the presence or absence of a malware in the attachment.

Answer: The bar chart is given below this table.

3

i) Enter the data into SPSS and produce the output necessary to determine if  the test application and attachment outcomes are statistically independent. What do you conclude? Make sure you supply all statistical information required and a Plain English interpretation. (Hint 3 columns of data Test (positive or negative), Attachment (Malware or not Malware), Count AND then Weight by count)

Answer: The test is given below the table:

3

j) Using the P(T+) and P(M) explain how the expected count for the cell which has the intersection of T+ and M is determined.

Answer: The chi square test for expected count is given below:

3

 

 

Chi-Square Test

Observed Frequencies

 

Column variable

 

Test

Malware (M)

No Malware (~M-)

Total

Test Malware T+

220

8

228

Test No Malware (~T)

11

259

270

Total

231

267

498

Expected Frequencies

 

Column variable

 

Test

Malware (M)

No Malware (~M-)

Total

Test Malware T+

105.7590361

122.2409639

228

Test No Malware (~T)

125.2409639

144.7590361

270

Total

231

267

498

Data

Level of Significance

0.05

Number of Rows

2

Number of Columns

2

Degrees of Freedom

1

Results

Critical Value

3.841459149

Chi-Square Test Statistic

424.5314735

p-Value

2.51843E-94

Reject the null hypothesis

 


Task 2:

 

The following table summarises a discrete random variable X.

x 0 1 2 3
P(X=x) .1 .4 .4 .1

 

2 70% is 7/10 for this section

Show formula and working

a) To be a random variable to two conditions must hold. Do they hold for X?

Answer 1) 0<= P(X=x)<=1, ∑P(X=x) = 1 (.1+.4+.4+.1=1)

Answer 2) yes, both conditions met.

4

b) Calculate the mean for the random variable X.

Answer: 1.5

2

c) Calculate the variance for the random variable X.

Answer: 0.65

2

d) Calculate the standard deviation for the random Variable X.

Answer: 0.8062

2

 

Calculations:

 

X

P(X)

XP(X)

(X-mean)

(X-mean)^2

P(X)*(X-mean)^2

0

0.1

0

-1.5

2.25

0.225

1

0.4

0.4

-0.5

0.25

0.1

2

0.4

0.8

0.5

0.25

0.1

3

0.1

0.3

1.5

2.25

0.225

Mean =

1.5

Variance =

0.65

Standard Deviation =

0.806225775

 

 

 


Task 3  Part A:

An experimental circuit board has six chips. The probability of a chip for the circuit board working is 0.9.

For these questions keep three significant after the decimal point (eg 0.00104  and .00000345 each have three significant figures 104 and 345 ie do not round to 0.0) Show formula, first substitution and answer where appropriate.

3A 70% is 21/30 for this section

Show all formula & working

   
a) Why might the number of working chips be considered to be a binomially distributed random variable? Provide all four assumptions AND express in terms of the variable/data provided.

Answer: number of working chips are considered as a binomially distributed because each chip have only two outcomes as it works or not works. There should be n trials and each classifies as success or failure. Here, we have n=6 and each chip classifies as working or not working. Trials are independent. The probability of success or working chip is constant at each trial.

  6
b) What is the probability that exactly six chips will work correctly?

Answer: 0.531441

  4
c) What is the probability that exactly five chips will work correctly?

Answer: 0.354294

  4
d) What is the probability that less than five chips will work correctly?

Answer: 0.114265

  4
e) Calculate the mean number of chips that should work correctly.

Answer: mean = 6*0.9 = 5.4

  4
f) Calculate the variance in the number of chips that should work correctly.

Answer: variance = npq = 6*0.9*0.1 = 0.54

  4
g) Calculate the standard deviation of the number of chips that should work correctly.

Answer: standard deviation = sqrt(npq) = sqrt(0.54) = 0.734847

  4


Task 3 Part B:

To test whether the chips met specification 140 of the six chip circuit boards were tested. Data was collected as to the number chips that worked correctly on each circuit board (cicuit.sav). You are to conduct a goodness of fit test to see if the Binomial (6, 0.9) model fits the data as suggested in Part A.

3B 70% is 21/30 for this section

Show formula & working

   
h) Specify the null hypothesis

Answer: The data for chips with working and not working follows the binomial distribution.

  3
i) Specify the alternative hypothesis

Answer: The data for chips with working and not working does not follows the binomial distribution.

  3
j) What assumption must hold regarding the expected frequency in each cell?

Answer: each chip should be two outcomes as working or not working.

  4
k) Calculate the chi-square goodness of fit test statistic. (Show formula and first substitution as well as the answer.

Answer:

  4
l) How many degrees of freedom do you have when testing this model?

Answer: 139

  4
m) State your decision rule for testing the hypothesis when a=.05.

Answer: Reject the null hypothesis if the p-value less than alpha value, do not reject the null hypothesis if the p-value is greater than the alpha value.

  4
n) What do you conclude from this test?  Specify all statistical information required and the Plain English statement of you findings?

Answer: here, we get p-value < 0.05, so we reject the null hypothesis that data does not follow binomial distribution.

  4
o) Demonstrate using the data how you would calibrate the model if the data were not distributed Binomial(6,0.9).

Answer:

  4

 The answers for part k to o are given in the following output: