- The researcher must resort to sample survey method from different section of the society and collect sample data from each section because to reduce excess cost and time, there is no need for complete enumeration and a good representative sample will always serve as an alternative to entire population.
- The researcher may use the method of Simple Random Sampling from each of the different sections of the society since simple random sampling assign equal probability of selection to each unit and here in this case each sample unit must have equal probability of selection. In other words, the sampling unit under consideration are alike in nature and hence should have equal probability of selection.
3. To efficiently study the effect of television on different problems faced by people, the researcher may collect data from different households on attributes like no. of hours a household spend on television, the no. of obese children in each household. Both of these variables are count data and thus discrete in nature. To study how television causes violence in society, the researcher may collect data on no. of television sets sold in a year and no. of crime committed in that same year. Here again the type of data in discrete as both the attributes are count data. To study how television caused debt, the researcher may collect data from different households on the no. of hours spent on television and the debt of each household. Here, the former variable is discrete and the latter variable is continuous in nature.
4. The researcher may face several problems like a household may not like to reveal the amount of debt. Also, they may not be able to provide accurate information on the hours spent on television.
- The researcher must first find the difference between the smallest and the largest value for each of the variables. Let us denote this difference by R. Then the researcher has to determine the size of each class, the lesser the size of the class intervals, the more accurate is the analysis, let this size be c. Then to find the no. of class intervals l, he obtain it from l= R/c. Once the researcher gets the no. of class intervals, he can easily find out the class intervals for both the classes.
- The histograms for the two variables are given below.
Histogram of Hours spent on Television
We observe the following things on the shape of the Histogram:
- The histogram is positively skewed in nature, or skewness is greater than zero. Hence the distribution of the data is positively skewed.
- The central tendency of the data is somewhere between 12000 and 150000.
- The data is well dispersed and the kurtosis maybe supposed to be around zero.
- The appropriate plot that can be used to investigate the relationship between the two variables is a Scatterplot. Here we plot the values of hours spent on television against total debt on a scatterplot and we obtain the following plot.
We generally choose the independent or the explanatory variable on the X-axis and the dependent or the response variable on the Y-axis. Here we consider Hours spent on television to be the explanatory variable and Debt to be the response variable as here, we are interested to find out how the former affect the latter and thus the former must be chosen as our explanatory variable.
The linear trend line along with the equation and the coefficient of determination is also shown on the graph and we obtain
Linear Trend Equation: y = 2531.x + 49430
Coefficient of determination: R² = 0.306
- The numerical summary report on the two variables of the data are given below
|Measures||Hours Spent on Television||Total Debt|
We thus observe that:
- The central tendency for Television Hours is around 30 and that for the Total Debt is around 126500.
- Both the variables has a large range and are well dispersed.
- The five point summary for the two variables is represented in the boxplot as given below.
Boxplot for Television Hours
- The strength of relationship between the two variables is given by the correlation coefficient between the two variables. Let X denotes the Hours spent on Television and Y denote the Total Debt. Then the correlation between X and Y is given by,
, denote the sample mean of x and y respectively.
Thus we obtain the value of correlation coefficient as R =0.554.
- We observe that the value of Correlation Coefficient R is positive and hence there is positive relation between the two variables. Simply put, as the hours spent on television increases, the total debt also increases.
- The value of R is neither too high nor too low in magnitude implying that the relationship between the two variables is moderately strong and neither too high nor too low.
- Here the researcher is interested to find out how the Hours spent on television are affecting the total debt of the households. Thus he prefers to find out how the Television Hours is efficient in determining the Total debt and hence Television hours is the independent variable which is assumed here to predict the Total Debt which is the dependent variable in this analysis.
- Let X denotes the Hours spent on Television and Y denotes the Total Debt.
Thus the simple linear regression model of Y on X is represented by
y= a + b*x + e,
where a and b are unknown parameters estimated by least square theory.
Y and x are the observed value of Y and X respectively.
e be the random error associated in prediction.
Then the predicted model is given by
Y1= a1+ b1*x,
Where Y1 is the predicted value of y and a1 and b1 are the estimated values of the parameters.
We obtain the estimated model as
Y1= 2531.x + 49430
- As b1 (=2531)>0, thus we conclude that the slope of the linear regression line is positive and hence there is positive relationship between the two variables; as X increases, Y also increase.
- b1= 2531 which implies that with a unit increase in x, y increases by 2531 units.
- a1= 49430, which is the y intercept of the linear regression equation line.
- The Coefficient of determination is given by R² which in this case is 0.306. This gives us a measure of goodness of fit of the regression equation.
We observe that the coefficient of determination is not very high and is moderately low in magnitude. Hence we may suggest that the linear regression equation we fit here is not very efficient in determining the relationship between the two variables and hence we must resort to higher degrees of regression equation.
- To compare the sales of various Car Model In Queensland in the years 2010 and 2011, we should use bar diagram showing the sales of different model for both the years side by side. Since the no. of cars sold is a discrete variable it is desirable to represent the sales through bar diagram so that we can compare their sales from the pictorial representation. Also, we may be interested to compare their sales over the two years and hence we plot the data for both the years on the same graph to compare their sales over years. The bar diagram is shown below:
We observe the following things from the graph:
- The sales of the models Toyota, Holden, Ford, Mitsubishi, Subaru and Honda have decreased from the years 2010 to the year 2011.
- The sales of the models Mazda, Hyundai, Nissan and Volkswagen have increased from 2010 to 2011.
- The magnitude of increase and decrease in sales over the years is not very significant.
- The sale of Toyota has been the highest among the all other models for both the years 2010 and 2011.
- The sale of Volkswagen and Honda has been the lowest among all other models for the years 2010 and 2011 respectively.
- To compare the market share of different models, we may either use the pie diagram for the sales of models individually for the years 2010 and 2011 or we may use a subdivided bar diagram for both the years. In both of the graphs mentioned we get the percentage of sales of each of the model in Queensland and this would provide us a pictorial representation of the percentage share of each model in the market and help us to compare their market share. We provide the Subdivided Bar Diagram below:
We observe the following points:
- The market share of Toyota has been the highest for both the years 2010 (25.22 %) and 2011 (22.36 %).
- However the market sale of Toyota has decreased over the last year.
- For other models there has not been any significant increase or decrease of market share over the two years.
- The market share of the models Toyota, Holden, Ford, Mitsubishi, Subaru and Honda have decreased from the years 2010 to the year 2011.
- The market share of the models Mazda, Hyundai, Nissan and Volkswagen have increased from 2010 to 2011.
- The market share of Volkswagen and Honda has been the least for 2010 and 2011 respectively.