Statistics: 876066

Statistics

Question 1

L (|Y) ~ N (60,)

Prior N(10,)

Since the standard deviation of the population is known, it follows that the variance of the population is also known.

 = 20

Variance =  = 400

Posterior mean,  =

We know the following from the information given

=  = 400

 = 60

 =  = 4

 = 10

 =

N= 100

Posterior mean,  =

                             =

                             =

                             =

                              = 4(1501)

                               = 6004

The posterior mean is 60004.

No correct to justify this prior. The posterior mean is way too big compared to the mean of the likelihood function.

L(|Y) ~ N(60,)

Prior N(40,)

Since the standard deviation of the population is known, it follows that the variance of the population is also known.

 = 20

Variance =  = 400

Posterior mean,  =

We know the following from the information given

=  = 400

 = 60

 =  = 4

 = 40

 = 9

N= 100

Posterior mean,  =   = 183777

The posterior mean is 183,777

No it is not negligible

When variance = 8000

=

    =

    = 499937

When variance = 9000

=

    =

   = 499944

When variance = 10000

=

    =

   = 499950

There is need to justify one variance. Different variance will yield different posterior mean. The posterior mean for  = 10,000 is higher than for  = 9,000, and  = 8,000.

95% Bayesian Credible interval=   1.96*

                                                   = 500 1.96*

                                                   = 500  16.53

                                                    = (483.47, 516.53)

Classical 95% CI =   1.96

                                                   = 500 1.96*30*

                                                   = 500  58.8

                                                    = (441.2, 558.8)

Classical 95% credible interval does not approximate 95% Bayesian Credible interval. As shown in the above calculation, the Classical 95% credible interval is wider compared to 95% Bayesian Credible interval.

Question 2

The prior Beta (2, 2) gives prior information about what we belief of a certain distribution. In this case, it is believed that our distribution has the probability of failing equal to the probability of success at 2/4 = 0.5.

The prior Beta (11, 41) gives prior information about what we belief of a certain distribution. In this case, it is believed that our distribution has the probability of success as   and the probability of failure as   .

The prior Beta (21, 81) gives prior information about what we belief of a certain distribution. In this case, it is believed that our distribution has the probability of success as   and the probability of failure as   .

Calculating the means

N=20

Y = 1

For Beta prior; Beta (2, 2)

But beta posterior is given as: Beta(y+1,n-y+1)

Beta(y+1,n-y+1)

Thus Beta-Binomial;

Beta (1+2, 20-1+1) = Beta (3, 20)

Posterior Mean, E (|Y) =  =  = 0.1304

For Beta prior; Beta (11, 41)

But beta posterior is given as: Beta(y+1, n-y+1)

Beta(y+1, n-y+1)

Thus Beta-Binomial;

Beta (1+11, 20-1+1) = Beta (12,20)

Posterior Mean, E (|Y) =  =  = 0.375

For Beta prior; Beta (21, 81)

But beta posterior is given as: Beta(y+1, n-y+1)

Beta(y+1, n-y+1)

Thus Beta-Binomial;

Beta (1+21, 20-1+1) = Beta (22, 20)

Posterior Mean, E (|Y) =  =  = 0.524

For Beta prior; Beta (2, 2)

Pr(<0.2|Y) =

                    =

                    =

                     = 0.0003

For Beta prior; Beta (11, 41)

Pr(<0.2|Y) =

                    =

                    =

                     = 4.9*

For Beta prior; Beta (21, 81)

Pr(<0.2|Y) =

                    =

                    =

                     = 0.00

If I had to choose between the three priors, I would go for Beta (21, 81) since Beta (21, 81) has highest expectation of the three priors.

  Calculating the means

N=120

Y = 24

For Beta prior; Beta (2, 2)

But beta posterior is given as: Beta(y+1,n-y+1)

Beta(y+1,n-y+1)

Thus Beta-Binomial;

Beta (24+2, 120-24+1) = Beta (26, 97)

Posterior Mean, E (|Y) =  =  = 0.2113

For Beta prior; Beta (11, 41)

But beta posterior is given as: Beta(y+1, n-y+1)

Beta(y+1, n-y+1)

Thus Beta-Binomial;

Beta (24+11, 120-24+1) = Beta (35,97)

Posterior Mean, E (|Y) =  =  = 0.285

For Beta prior; Beta (21, 81)

But beta posterior is given as: Beta(y+1, n-y+1)

Beta(y+1, n-y+1)

Thus Beta-Binomial;

Beta (24+21, 120-24+1) = Beta (45, 97)

Posterior Mean, E (|Y) =  =  = 0.366

I would recommend Beta (21, 81) prior since it yields the highest expected mean of all the three priors.

Question 3

a)   library(readxl)Data <- read_excel(“C:/Users/Myles/Downloads/Desktop/Data.xlsx”)View(Data)

noaward

count0<- length(which(nawards == 0))

count1<-length(which(nawards == 1))

total<-count0+count1

prob_noaward<-count0/total

prob_award<-count1/total

count0

count1

total

prob_noaward

prob_award

R Output

> library(readxl)> Data <- read_excel(“C:/Users/Myles/Downloads/Desktop/Data.xlsx”)> View(Data)> noaward[1] 0> count0<- length(which(nawards == 0)) > count1<-length(which(nawards == 1))> total<-count0+count1> prob_noaward<-count0/total> prob_award<-count1/total> count0[1] 451> count1[1] 34> total[1] 485> prob_noaward[1] 0.9298969> prob_award[1] 0.07010309 Therefore, the probability of people who receive award is 0.0701 while the probability of people who didn’t receive an award is 0.929869.

Salary scale 2 and awarded

 R code

logit<-glm(nawards~salscale2+salscale3+salscale4+disat+sat+age1+age2+age3+age4)logit

output

> logit<-glm(nawards~salscale2+salscale3+salscale4+disat+sat+age1+age2+age3+age4)> logit Call:  glm(formula = nawards ~ salscale2 + salscale3 + salscale4 + disat +     sat + age1 + age2 + age3 + age4) Coefficients:(Intercept)    salscale2    salscale3    salscale4        disat          sat         age1      1.88929      0.12285      0.38808     -0.80646      0.04113      2.52378     -1.36387         age2         age3         age4     -1.63379     -0.87256     -0.55708   Degrees of Freedom: 600 Total (i.e. Null);  591 ResidualNull Deviance:     6529 Residual Deviance: 5762        AIC: 3086> coef(logit)(Intercept)   salscale2   salscale3   salscale4       disat         sat        age1        age2   1.8892880   0.1228489   0.3880804  -0.8064585   0.0411348   2.5237785  -1.3638693  -1.6337914        age3        age4  -0.8725600  -0.5570784

The p-values at alpha = 0.05 are greater than 0.05 except dissatisfied variable. Therefore,  only dissatisfied variable significantly affects independent variable “n awards”.

logit2<-glm(nawards~disat+sat+age1+age2+age3+age4+age5+age6)logit2 Call:  glm(formula = nawards ~ disat + sat + age1 + age2 + age3 + age4 +     age5 + age6) Coefficients:(Intercept)        disat          sat         age1         age2         age3         age4     1.680745     0.056792     2.642757    -1.228719    -1.465412    -0.739807    -0.603963         age5         age6    -0.007327           NA   Degrees of Freedom: 600 Total (i.e. Null);  593 ResidualNull Deviance:     6529 Residual Deviance: 5892        AIC: 3095p<-prop.table(award>2.3*noaward)> p[1] 1count0<- length(which(sat== 0)) count1<-length(which(disat== 1))total1<-count0+count1prob_disat<-count0/total1prob_sat<-count1/total1total1[1] 729> prob_sat[1] 0.266118> prob_disat[1] 0.73388

Prior distribution used is beta since we have two options (0 and 1)

Prior : Beta()

Posterior median estimates of the satisfaction = Probability of satisfaction = 0.266.

f)

P(sat> 2.3*disat) = 1

The probability of satisfaction being 2.3 times dissatisfaction is 1 so it is bound to happen.

  1. g)

The r code creates a vector of “0” from the function y. Also , the vector created is used to plot a graph

  1. h)