Statistics

Question 1

L (|Y) ~ N (60,)

Prior N(10,)

Since the standard deviation of the population is known, it follows that the variance of the population is also known.

= 20

Variance = = 400

Posterior mean, =

We know the following from the information given

= = 400

= 60

= = 4

= 10

=

N= 100

Posterior mean, =

=

=

=

= 4(1501)

= 6004

The posterior mean is 60004.

No correct to justify this prior. The posterior mean is way too big compared to the mean of the likelihood function.

L(|Y) ~ N(60,)

Prior N(40,)

Since the standard deviation of the population is known, it follows that the variance of the population is also known.

= 20

Variance = = 400

Posterior mean, =

We know the following from the information given

= = 400

= 60

= = 4

= 40

= 9

N= 100

Posterior mean, = = 183777

The posterior mean is 183,777

No it is not negligible

When variance = 8000

=

=

= 499937

When variance = 9000

=

=

= 499944

When variance = 10000

=

=

= 499950

There is need to justify one variance. Different variance will yield different posterior mean. The posterior mean for = 10,000 is higher than for = 9,000, and = 8,000.

95% Bayesian Credible interval= 1.96*

= 500 1.96*

= 500 16.53

= (483.47, 516.53)

Classical 95% CI = 1.96

= 500 1.96*30*

= 500 58.8

= (441.2, 558.8)

Classical 95% credible interval does not approximate 95% Bayesian Credible interval. As shown in the above calculation, the Classical 95% credible interval is wider compared to 95% Bayesian Credible interval.

Question 2

The prior Beta (2, 2) gives prior information about what we belief of a certain distribution. In this case, it is believed that our distribution has the probability of failing equal to the probability of success at 2/4 = 0.5.

The prior Beta (11, 41) gives prior information about what we belief of a certain distribution. In this case, it is believed that our distribution has the probability of success as and the probability of failure as .

The prior Beta (21, 81) gives prior information about what we belief of a certain distribution. In this case, it is believed that our distribution has the probability of success as and the probability of failure as .

Calculating the means

N=20

Y = 1

__For Beta prior; Beta (2, 2)__

But beta posterior is given as: Beta(y+1,n-y+1)

Beta(y+1,n-y+1)

Thus Beta-Binomial;

Beta (1+2, 20-1+1) = Beta (3, 20)

Posterior Mean, E (|Y) = = = 0.1304

__For Beta prior; Beta (11, 41)__

But beta posterior is given as: Beta(y+1, n-y+1)

Beta(y+1, n-y+1)

Thus Beta-Binomial;

Beta (1+11, 20-1+1) = Beta (12,20)

Posterior Mean, E (|Y) = = = 0.375

__For Beta prior; Beta (21, 81)__

But beta posterior is given as: Beta(y+1, n-y+1)

Beta(y+1, n-y+1)

Thus Beta-Binomial;

Beta (1+21, 20-1+1) = Beta (22, 20)

Posterior Mean, E (|Y) = = = 0.524

__For Beta prior; Beta (2, 2)__

Pr(<0.2|Y) =

=

=

= 0.0003

__For Beta prior; Beta (11, 41)__

Pr(<0.2|Y) =

=

=

= 4.9*

__For Beta prior; Beta (21, 81)__

Pr(<0.2|Y) =

=

=

= 0.00

If I had to choose between the three priors, I would go for Beta (21, 81) since Beta (21, 81) has highest expectation of the three priors.

Calculating the means

N=120

Y = 24

__For Beta prior; Beta (2, 2)__

But beta posterior is given as: Beta(y+1,n-y+1)

Beta(y+1,n-y+1)

Thus Beta-Binomial;

Beta (24+2, 120-24+1) = Beta (26, 97)

Posterior Mean, E (|Y) = = = 0.2113

__For Beta prior; Beta (11, 41)__

But beta posterior is given as: Beta(y+1, n-y+1)

Beta(y+1, n-y+1)

Thus Beta-Binomial;

Beta (24+11, 120-24+1) = Beta (35,97)

Posterior Mean, E (|Y) = = = 0.285

__For Beta prior; Beta (21, 81)__

But beta posterior is given as: Beta(y+1, n-y+1)

Beta(y+1, n-y+1)

Thus Beta-Binomial;

Beta (24+21, 120-24+1) = Beta (45, 97)

Posterior Mean, E (|Y) = = = 0.366

I would recommend Beta (21, 81) prior since it yields the highest expected mean of all the three priors.

Question 3

a) library(readxl)Data <- read_excel(“C:/Users/Myles/Downloads/Desktop/Data.xlsx”)View(Data)

noaward

count0<- length(which(nawards == 0))

count1<-length(which(nawards == 1))

total<-count0+count1

prob_noaward<-count0/total

prob_award<-count1/total

count0

count1

total

prob_noaward

prob_award

R Output

> library(readxl)> Data <- read_excel(“C:/Users/Myles/Downloads/Desktop/Data.xlsx”)> View(Data)> noaward[1] 0> count0<- length(which(nawards == 0)) > count1<-length(which(nawards == 1))> total<-count0+count1> prob_noaward<-count0/total> prob_award<-count1/total> count0[1] 451> count1[1] 34> total[1] 485> prob_noaward[1] 0.9298969> prob_award[1] 0.07010309 Therefore, the probability of people who receive award is 0.0701 while the probability of people who didn’t receive an award is 0.929869.

Salary scale 2 and awarded

R code

logit<-glm(nawards~salscale2+salscale3+salscale4+disat+sat+age1+age2+age3+age4)logit

output

> logit<-glm(nawards~salscale2+salscale3+salscale4+disat+sat+age1+age2+age3+age4)> logit Call: glm(formula = nawards ~ salscale2 + salscale3 + salscale4 + disat + sat + age1 + age2 + age3 + age4) Coefficients:(Intercept) salscale2 salscale3 salscale4 disat sat age1 1.88929 0.12285 0.38808 -0.80646 0.04113 2.52378 -1.36387 age2 age3 age4 -1.63379 -0.87256 -0.55708 Degrees of Freedom: 600 Total (i.e. Null); 591 ResidualNull Deviance: 6529 Residual Deviance: 5762 AIC: 3086> coef(logit)(Intercept) salscale2 salscale3 salscale4 disat sat age1 age2 1.8892880 0.1228489 0.3880804 -0.8064585 0.0411348 2.5237785 -1.3638693 -1.6337914 age3 age4 -0.8725600 -0.5570784

The p-values at alpha = 0.05 are greater than 0.05 except dissatisfied variable. Therefore, only dissatisfied variable significantly affects independent variable “n awards”.

logit2<-glm(nawards~disat+sat+age1+age2+age3+age4+age5+age6)logit2 Call: glm(formula = nawards ~ disat + sat + age1 + age2 + age3 + age4 + age5 + age6) Coefficients:(Intercept) disat sat age1 age2 age3 age4 1.680745 0.056792 2.642757 -1.228719 -1.465412 -0.739807 -0.603963 age5 age6 -0.007327 NA Degrees of Freedom: 600 Total (i.e. Null); 593 ResidualNull Deviance: 6529 Residual Deviance: 5892 AIC: 3095p<-prop.table(award>2.3*noaward)> p[1] 1count0<- length(which(sat== 0)) count1<-length(which(disat== 1))total1<-count0+count1prob_disat<-count0/total1prob_sat<-count1/total1total1[1] 729> prob_sat[1] 0.266118> prob_disat[1] 0.73388

Prior distribution used is beta since we have two options (0 and 1)

Prior : Beta()

Posterior median estimates of the satisfaction = Probability of satisfaction = 0.266.

f)

P(sat> 2.3*disat) = 1

The probability of satisfaction being 2.3 times dissatisfaction is 1 so it is bound to happen.

- g)

The r code creates a vector of “0” from the function y. Also , the vector created is used to plot a graph

- h)