For the variable Do they like the tv show
a) that liked the tv show with ending 1
=phat2=proportion of people that liked the tv show with ending 2
=phat3=proportion of people that liked the tv show with ending 3
c) Give a chart that compares the proportion of the 3 different tv show endings
=xbar1=the average amount people would pay for the dvd of the tv show with ending 1
=xbar2= the average amount people would pay for the dvd of the tv show with ending 2
=xbar3=the average amount people would pay for the dvd of the tv show withending3
BUS105 Business Statistics
Name of the Student:
Student ID: 11501885
Name of the University:
Table of Contents
Section 1. 3
Section 2. 4
Section 3. 4
Question 1. 4
Question 2. 5
Question 3. 6
Section 4. 8
Section 5. 10
Section 6. 11
Section 7. 13
Section 8. 16
The report is based on computing assignment in which certain concepts regarding statistics will be evaluated such that the interpretation of the phenomenon. However, the data provided was of report 100,000 people, on which a sample of 100 was selected for the each students against their roll number. The sample consisted of liking towards a TV show in which responses were gathered and the respective three endings were shown to different sample. However, the computation assignment was to be analysed on the responses given such that the results based on different endings and likeness to pay for the DVD of the movie.
Aims of the Assignment
The primary goals of the assignment are to understand the relative concepts that are needed in understanding of the sample of the statistics. The assignment considered is to study the questions that can be based on numerical output on the responses of the sample. However, calculation of descriptive statistics cannot be reliable in estimating the sample as different answers could yield different responses which could ultimately result in the reliability as well as validity issue. Moreover, this reliability issue can be well explained by the theoretical distributions known by large sample distribution known as Z distribution. On the other hand, Z distribution holds the answer in hypothesis testing where the sample could answer the results n the population as a whole.
Conversely, the computing assignment is divided into two sections namely section 1 and 2. The first section of the assignment takes care of the large data and the following interpretation of the data. The later section deals with reliability of the data that is through hypothesis testing to understand the applications of Z distribution.
Data set Description
a) The data set is based on the survey questionnaire that has been taken on 100,000 samples. However, it is difficult to study the sample as a whole. However, the large sample is divided amongst the students such that each student studies 100 samples to give possible endings on the TV show with different endings labelled as ending 1, ending 2 and ending 3. All the samples are asked to brief questions. The first question states “Do they like the movie?” and the second question states “How much would they pay for the DVD?” However, the first question mentioned is categorical in nature such that the answers of the people are in “Yes or No.” However, to interpret the results, Yes was given a value 2 and No was given a value 1. On the other hand, the DVD pay had numerical responses.
b) Other questions that could be addressed amongst the two questions would be to see a contrast in the endings such as “Compare ending 1 and ending 2/ ending 2 and ending 3, which ending according to you is likeable” and second question would be “Based on the contrast of the endings, what is the maximum amount that you could pay for the DVD?”
The existing question only highlights the specific responses on the ending of the TV show but did not illustrate any comparison. As a result, the comparison made would be easy to evaluate the endings and will also be beneficial in generalizing the study as a whole.
Review of the Data set
Liking for the TV show
a) Proportions of responses
èTV show with Ending 1 = = 19/35 = 0.542857 = phat1
èTV show with Ending 2 = = 17/29 = 0.586207 = phat2
èTV show with Ending 3 = = 24/36 = 0.666667 = phat3
b) Difference between proportions of responses
èphat1 – phat2 = = = 0.542857 – 0.586207 = -0.04335
èphat2 – phat3 = = 0.586207 – 0.666667 = -0.08046
c) Chart for proportions of three endings of a show
|Ending 1||Ending 2||Ending 3|
a) Average of the people who will pay for the DVD with three different endings
è x̅ 1 = 5.734286 (Ending 1)
è x̅ 2 = 5.565517 (Ending 2)
è x̅ 3 = 6.661111 (Ending 3)
b) Difference between means
è x̅ 1- x̅ 2 = 5.734286 – 5.565517 = 0.168768
è x̅ 2- x̅ 3 = 5.565517 – 6.661111 = -1.09559
a) The histogram pay for DVD pay on each ending is given below. However, histogram analyses the spread of the data while following assumption of normal distribtion
d) Ending 3
|Summary Statistics of Ending 3|
|Sample standard deviation||
Endings on TV Show
The extent of the TV demonstrates ending portrays that the consequences of the enjoying of the TV with ending 3 has the most extreme reactions of the respondent that like the TV show. The less responses that are seen on the particular ending is ending 1 where the category of “Yes” has only been 54.2857%. Although, the responses seen does not deviate much in ending 1 and ending 2 but the deviation from ending 3 has be almost 10%. However, the people who were made to see ending 1 did not like the ending 1 much as compared to any other endings. On the other hand, ending 2 and ending 3 had a vast gap between the responses on the category “Yes or No.” If analysed on the percentage of responses, ending 3 gave the maximum responses on “Yes” which came out to be 66.67% in contrast of the responses of ending 2 were just 58.62%.
Nevertheless, the repeated endings one after the other resulted to be good as the endings depicted the positive responses in the ascending order of the Endings as Ending 1 < Ending 2 < Ending 3. However, if evaluated on proportions of the responses, then the results have been as 0.542857 < 0.586207 < 0.666667 based on the liking of the TV show respectively.
On the other hand, depending on the likable endings of the TV show the responses have been strangely more in Ending 2 than in Ending 1 such that the mean of the responses of the DVD Pay is given as Ending 2 < Ending 1 < Ending 3 as 5.565517 < 5.734286 < 6.661111.
On ending 1, the results have been mixed rather the concentration of the responses has been more shown on the bin range (depicting more responses between 0 and 1). However, total of 16 responses constituting of 45.71% on the DVD pay as many respondents who did not like the TV show were 16 out of 35. This vale depicts that all the respondents who did not like the TV show constituted to be paying 1 or less than and the other 19 responses who were constituted to be under the “Yes” category had ranging to pay the price of the DVD show ranged from $7 to $14 with maximum respondents opting to pay $9 and $11.
When depicted on the histogram, the graph showed to be unimodal in nature but is inclined towards bimodal as kurtosis came out to be -1.724 with many troughs and crests and with positive skewness where mean being less than median and mode.
The results on DVD pay depicts that 41.38% of responses were paying on the bin range 1 (that includes value between 0 and 1) that constituted of frequency 12. However, this number depicts that the number of responses made on the liking of the TV show on the “No” category constituted to be 12 which matches with the frequency of DVD pay. However, the respondents who did not like the TV show on the whole wish to pay below $1 or $1.
On the other hand, the respondents who like the TV show constituted to be 17 and these 17 responses have been spread along the pay of $6 to $14. As seen, the maximum responses on DVD pay have been 4 respondents who are willing to pay $4 followed by 3 respondents each on $6 and $11. As the DVD mean pay comes out to be $5.565517 that states the concentration of responses would be seen on DVD pay of $6.
When depicted on histogram, the graph showed to be unimodal in nature but is inclined towards bimodal as kurtosis came out to be -1.644 and even had positive skewness because mean has been less than median as well as mode.
The results on DVD pay depicts that the 33.33% of the responses have been gathered from the 12 respondents who did not like the TV show such that these respondents were the one who came under the category “No” when asked for the liking of the TV show. However the 33.33% of the responses had been under the $1. On the other hand, respondents who are willing to pay good for the DVD are the 24 respondents who liked the ending 3 of the TV show. Moreover, the variability in the pay is concentrated from $6 to $14 and the maximum that are willing to pay more are concentrated on the $10 comprising of 8 respondents followed by 6 on $9.
However, according to the depiction on the histogram, it is unimodal in nature but is normally distributed but there is inclination towards negative skewness as the skewness value came out to be -0.47047.
Problem of survey data in the real world
At the point, where the business analyses the consequences on the overview of the present reality, it plans to give careful considerations for the specific reason that the people responding to the survey may not constitute the people that belong to the target market. However, if the general population in the business sector does not acknowledge the right target market then this may lead to falsifying of the results as being biased irrespective of the representation of the population that the sample size is large. However, according to the results, problem in the survey would be severe if a certain type of person is only considered in the survey that is always ready and possesses a lot of free time. Moreover, as a result the response to the survey would pose a problem of low response rate as well.
1a) “Variable ending type is independent of the variable like TV show yes/no”
i.e., Ho: = (Not independent)
H1: – ≠ 0 (Independent)
Given: n1 = 35 and n2 = 29, = 0.542857 and = 0.586207, p* = (x1 + x2)/( n1 + n2) = (19 +17)/ (35 + 29) = 36/64 = 0.5625
Sample Error =) =) =) = = 0.393918
è| Z | = (phat1)/ Sample Error = (0.542857 – 0.586207) / 0.393918 = 0.110048
èp value = Prob (z > 0.11004) + Prob (z < 0.11004) = 0.0438 + 0.0438 = 0.0876
As seen that the Z statistics is lower than the table value of Z that depicts that the null hypothesis is accepted. However, it can be summed up that TV showing with ending 1 and 2 is not independent of the categorical responses “Yes or No.” Howsoever, it can be said that there is dependency of the TV show and its likeness on ending 1 and ending 2 specifically.
b) “Test the claim that there is a difference in the amount people would pay for ending 1 and ending 2”
Ho: µ1 = µ2 (No difference)
H1: µ1 ≠ µ2 (There is certain difference)
Given: n1 = 35 and n2 = 29, x̅1 = 5.734286 and x̅2 = 5.565517, σ1 = 5 and σ2 = 4.9
ð Standard deviation = = =
ð Standard deviation = 1.241
Sample Error = s * () = 1.241 * = 1.241* 0.24066 = 0.298659
| Z | = (x̅1 -x̅2)/ Sample Error = (5.734286 – 5.565517) / 0.298662= 0.168768/ 0.298662= 0.56508
As seen that the Z statistics is lower than the table value of Z that is 1.96 whereas it depicts that the null hypothesis is accepted. However, it can be implemented that there is no difference for the amount people pay for ending 1 and ending 2 of the TV show.
2a) “Variable ending type is independent of the variable like TV show yes/no”
i.e., Ho: = (Not independent)
H1: – ≠ 0 (Independent)
Given: n2 = 29 and n2 = 36, = 0.586207 and = 0.666667, p* = (x2 + x3)/( n3 + n2) = (24 +17)/ (36 + 29) = 41/65 = 0.630769
Sample Error =) =) =) = = 0.391430
è| Z | = (phat2)/ Sample Error = (0.586207 – 0.66667) / 0.391430= 0.20556
èp value = Prob (z > 0.20556) + Prob (z < 0.20556) = 0.0793 + 0.0793 = 0.1586
As seen that the Z statistics is lower than the table value of Z that depicts that the null hypothesis is accepted. However, it can be summed up that TV showing with ending 2 and 3 is not independent of the categorical responses “Yes or No.” Howsoever, it can be said that there is dependency of the TV show and its likeness on ending 2 and ending 3 specifically.
b) “Test the claim that there is a difference in the amount people would pay for ending 2 and ending 3”
Ho: µ2 = µ3 (No difference)
H1: µ2 ≠ µ3 (There is certain difference)
Given: n2 = 29 and n3 = 36, x̅2 = 5.565517 and x̅3 = 6.661111, σ2 = 4.9 and σ3 = 4.6
ð Standard deviation = = =
ð Standard deviation = 1.18983
Sample Error = s * () = 1.18983 * = 1.18983* 0.24952 = 0.296887
| Z | = (x̅1 -x̅2)/ Sample Error = (6.661111– 5.565517) / 0.29668= 1.095594/ 0.296887 = 3.69027
As seen that the Z statistics is higher as the table value of Z at 95% level that is 1.96 whereas it depicts that the alternate hypothesis is accepted. However, it can be implemented that there is difference on the amount of DVD pay for the ending 2 and ending 3.
Concept of Sample distribution
The sample distribution can be worked out based on the sample estimates which can be given as an example of deductive reasoning. However, the theory can be worked out based on the population were the sample has been taken. The sample can be based on the existing theory from which the hypothesis testing has taken the estimated of the sample
“p1-p2 is considered to be an estimate of phat1-phat2 (the difference between the proportions of population for ending 1 and 2)
p2-p3 is considered to be an estimate of phat2-phat3 (the difference between the proportions of populations for ending 2 and 3)
µ1– µ 2 is considered to be an estimate of xbar1-xbar2 (the difference between the population means of DVD pay of ending 1 and ending 2)
µ 2– µ 3 is considered to be an estimate of xbar2-xbar3 (the difference between the Population means of DVD pay of ending 2 and ending 3).”
Despite of skewness and kurtosis results, these are normally distributed amongst the sample because the sample size is large such that actual data does not matter on the whole. However, the normal distribution is given by the Normal Q-Q plots which was based on the samples in the excel file that constituted of 1000 samples. However, in the normal Q – Q plots, the z distribution is confirmed by finding the z scores.
The sample distribution of the estimates is considered to be difficult as the professors who do a lot of research, even for tem, it creates difficulty. This states that even when a post grad student does the research, one generally takes the help of published journal articles. The quantitative research not only helps in providing a base for the data but also helps in reading the work of the researcher. However, they would be able to understand that how accurate would be the sample based on the population and what difference could be expected in the numerical summary if another sample is considered. On the other hand, it can be depicted that checking the reliability of the samples is based on the existing methods and not that the researcher devises its methods.
To finish up, it can be comprehended, that the specimen of 100 drawn from the 100000 example will yield distinctive results. Notwithstanding, subsequent to ascertaining the extents it was comprehended that distinctive responses are because of the adjustment in the ending of the TV show took after by the sum to be paid for TV. Besides, on hypothesis testing it demonstrated that there is no contrast between the ending of the TV show and the amount of pay on the two sets that is ending 1 and 2 and ending 2 and 3. In the wake of assessing the sample statistic it was cleared that the extents of the population has additionally been expanding with the ending of the TV show with “1 and 2” and “2 and 3.” However, the outcomes mean between the endings likewise delineates the same that the distinction in the e on TV show expanding in the population and there prolongs the dependency of the ending of the TV show and the DVD pay.