Statistics
Question 1
L (|Y) ~ N (60,)
Prior N(10,)
Since the standard deviation of the population is known, it follows that the variance of the population is also known.
= 20
Variance = = 400
Posterior mean, =
We know the following from the information given
= = 400
= 60
= = 4
= 10
=
N= 100
Posterior mean, =
=
=
=
= 4(1501)
= 6004
The posterior mean is 60004.
No correct to justify this prior. The posterior mean is way too big compared to the mean of the likelihood function.
L(|Y) ~ N(60,)
Prior N(40,)
Since the standard deviation of the population is known, it follows that the variance of the population is also known.
= 20
Variance = = 400
Posterior mean, =
We know the following from the information given
= = 400
= 60
= = 4
= 40
= 9
N= 100
Posterior mean, = = 183777
The posterior mean is 183,777
No it is not negligible
When variance = 8000
=
=
= 499937
When variance = 9000
=
=
= 499944
When variance = 10000
=
=
= 499950
There is need to justify one variance. Different variance will yield different posterior mean. The posterior mean for = 10,000 is higher than for = 9,000, and = 8,000.
95% Bayesian Credible interval= 1.96*
= 500 1.96*
= 500 16.53
= (483.47, 516.53)
Classical 95% CI = 1.96
= 500 1.96*30*
= 500 58.8
= (441.2, 558.8)
Classical 95% credible interval does not approximate 95% Bayesian Credible interval. As shown in the above calculation, the Classical 95% credible interval is wider compared to 95% Bayesian Credible interval.
Question 2
The prior Beta (2, 2) gives prior information about what we belief of a certain distribution. In this case, it is believed that our distribution has the probability of failing equal to the probability of success at 2/4 = 0.5.
The prior Beta (11, 41) gives prior information about what we belief of a certain distribution. In this case, it is believed that our distribution has the probability of success as and the probability of failure as .
The prior Beta (21, 81) gives prior information about what we belief of a certain distribution. In this case, it is believed that our distribution has the probability of success as and the probability of failure as .
Calculating the means
N=20
Y = 1
For Beta prior; Beta (2, 2)
But beta posterior is given as: Beta(y+1,n-y+1)
Beta(y+1,n-y+1)
Thus Beta-Binomial;
Beta (1+2, 20-1+1) = Beta (3, 20)
Posterior Mean, E (|Y) = = = 0.1304
For Beta prior; Beta (11, 41)
But beta posterior is given as: Beta(y+1, n-y+1)
Beta(y+1, n-y+1)
Thus Beta-Binomial;
Beta (1+11, 20-1+1) = Beta (12,20)
Posterior Mean, E (|Y) = = = 0.375
For Beta prior; Beta (21, 81)
But beta posterior is given as: Beta(y+1, n-y+1)
Beta(y+1, n-y+1)
Thus Beta-Binomial;
Beta (1+21, 20-1+1) = Beta (22, 20)
Posterior Mean, E (|Y) = = = 0.524
For Beta prior; Beta (2, 2)
Pr(<0.2|Y) =
=
=
= 0.0003
For Beta prior; Beta (11, 41)
Pr(<0.2|Y) =
=
=
= 4.9*
For Beta prior; Beta (21, 81)
Pr(<0.2|Y) =
=
=
= 0.00
If I had to choose between the three priors, I would go for Beta (21, 81) since Beta (21, 81) has highest expectation of the three priors.
Calculating the means
N=120
Y = 24
For Beta prior; Beta (2, 2)
But beta posterior is given as: Beta(y+1,n-y+1)
Beta(y+1,n-y+1)
Thus Beta-Binomial;
Beta (24+2, 120-24+1) = Beta (26, 97)
Posterior Mean, E (|Y) = = = 0.2113
For Beta prior; Beta (11, 41)
But beta posterior is given as: Beta(y+1, n-y+1)
Beta(y+1, n-y+1)
Thus Beta-Binomial;
Beta (24+11, 120-24+1) = Beta (35,97)
Posterior Mean, E (|Y) = = = 0.285
For Beta prior; Beta (21, 81)
But beta posterior is given as: Beta(y+1, n-y+1)
Beta(y+1, n-y+1)
Thus Beta-Binomial;
Beta (24+21, 120-24+1) = Beta (45, 97)
Posterior Mean, E (|Y) = = = 0.366
I would recommend Beta (21, 81) prior since it yields the highest expected mean of all the three priors.
Question 3
a) library(readxl)Data <- read_excel(“C:/Users/Myles/Downloads/Desktop/Data.xlsx”)View(Data)
noaward
count0<- length(which(nawards == 0))
count1<-length(which(nawards == 1))
total<-count0+count1
prob_noaward<-count0/total
prob_award<-count1/total
count0
count1
total
prob_noaward
prob_award
R Output
> library(readxl)> Data <- read_excel(“C:/Users/Myles/Downloads/Desktop/Data.xlsx”)> View(Data)> noaward[1] 0> count0<- length(which(nawards == 0)) > count1<-length(which(nawards == 1))> total<-count0+count1> prob_noaward<-count0/total> prob_award<-count1/total> count0[1] 451> count1[1] 34> total[1] 485> prob_noaward[1] 0.9298969> prob_award[1] 0.07010309 Therefore, the probability of people who receive award is 0.0701 while the probability of people who didn’t receive an award is 0.929869.
Salary scale 2 and awarded
R code
logit<-glm(nawards~salscale2+salscale3+salscale4+disat+sat+age1+age2+age3+age4)logit
output
> logit<-glm(nawards~salscale2+salscale3+salscale4+disat+sat+age1+age2+age3+age4)> logit Call: glm(formula = nawards ~ salscale2 + salscale3 + salscale4 + disat + sat + age1 + age2 + age3 + age4) Coefficients:(Intercept) salscale2 salscale3 salscale4 disat sat age1 1.88929 0.12285 0.38808 -0.80646 0.04113 2.52378 -1.36387 age2 age3 age4 -1.63379 -0.87256 -0.55708 Degrees of Freedom: 600 Total (i.e. Null); 591 ResidualNull Deviance: 6529 Residual Deviance: 5762 AIC: 3086> coef(logit)(Intercept) salscale2 salscale3 salscale4 disat sat age1 age2 1.8892880 0.1228489 0.3880804 -0.8064585 0.0411348 2.5237785 -1.3638693 -1.6337914 age3 age4 -0.8725600 -0.5570784
The p-values at alpha = 0.05 are greater than 0.05 except dissatisfied variable. Therefore, only dissatisfied variable significantly affects independent variable “n awards”.
logit2<-glm(nawards~disat+sat+age1+age2+age3+age4+age5+age6)logit2 Call: glm(formula = nawards ~ disat + sat + age1 + age2 + age3 + age4 + age5 + age6) Coefficients:(Intercept) disat sat age1 age2 age3 age4 1.680745 0.056792 2.642757 -1.228719 -1.465412 -0.739807 -0.603963 age5 age6 -0.007327 NA Degrees of Freedom: 600 Total (i.e. Null); 593 ResidualNull Deviance: 6529 Residual Deviance: 5892 AIC: 3095p<-prop.table(award>2.3*noaward)> p[1] 1count0<- length(which(sat== 0)) count1<-length(which(disat== 1))total1<-count0+count1prob_disat<-count0/total1prob_sat<-count1/total1total1[1] 729> prob_sat[1] 0.266118> prob_disat[1] 0.73388
Prior distribution used is beta since we have two options (0 and 1)
Prior : Beta()
Posterior median estimates of the satisfaction = Probability of satisfaction = 0.266.
f)
P(sat> 2.3*disat) = 1
The probability of satisfaction being 2.3 times dissatisfaction is 1 so it is bound to happen.
- g)
The r code creates a vector of “0” from the function y. Also , the vector created is used to plot a graph
- h)