Analytical Methods in Economics and Finance: 1045574 – My Assignment Help : Samples & Case Study Review Sample

Regression Models using Cross Section Data

Use the data set in DATA_ASSIGNMENT which contains information on number of medals won by each country between 1960 and 1999 in the Olympic Games and the characteristics of these countries. Country ID is the country identifier. Year denotes the year when the Olympics games held. Real GDP is the Real Gross Domestic Product of a country in millions of dollars. Population is the number of people living in a country in millions of people. Total Medals in the sum of gold, silver and bronze medals won by a country. Host Country is a dummy variable that takes the value 1 if the country is hosting the Olympic Games and takes the value 0 if the country is not hosting the games. Planned Economy is a dummy variable that takes the value 1 if the country is a planned economy and is not a member of Soviet Union and 0 otherwise. Soviet Union Member is a dummy variable that takes the value 1 if the country is a member of Soviet Union and takes the value 0 if the country is not a member.

Questions

Present the descriptive statistics of the variables Real GDP, Population, Total Medals. Comment on the means and measures of dispersion of the variables.

Solution

Real Gross Domestic Product (GDP)

The descriptive statistics of the Real Gross Domestic Product in millions of dollars is given in table 1 below. The mean, median, variance and standard deviation are 137726.658, 9110, 3.00986E+11 and 548622.1516 respectively. The skewness of the real gross domestic data is 8.0229. This is a positive value indicating that the data is positively skewed (Little, Deboek and Wu, 2015, p. 35).

Table 1: Descriptive Statistics (GDP-Millions of Dollars)

*Descriptive Statistics*	*(GDP millions of dollars)*

Mean	137726.658
Standard Error	15492.60937
Median	9110
Mode	1100
Standard Deviation	548622.1516
Sample Variance	3.00986E+11
Kurtosis	76.05248117
Skewness	8.022897218
Range	7279954
Minimum	46
Maximum	7280000
Sum	172709229.2
Count	1254
Confidence Level (95.0%)	30394.31601

Population (Millions of People)

The descriptive statistics of the Population in millions is given as shown in the table 2 below. The mean, median, variance and standard deviation are 27.53976778, 2.640256344, 8741.575765 and 7.020635128 respectively. The skewness of the population data is positive value suggesting a relative skewness in the data (Little, Deboeck and Wu, 2015, p.49).

Table 2: Descriptive Statistics of “Population” in millions

*Descriptive Statistics (Population in millions)*

Mean	27.53976778
Standard Error	2.640256344
Median	7.020635128
Mode	0.02
Standard Deviation	93.4963944
Sample Variance	8741.575765
Kurtosis	86.37505268
Skewness	8.618197335
Range	1219.98504
Minimum	0.01496041
Maximum	1220
Sum	34534.8688
Count	1254
Confidence Level (95.0%)	5.17981082

Total Medals

The descriptive statistics of the total medals earned by a given country is given as shown in the table 3 below. The mean, median and standard deviation are 5.07496, 0 and 16.17332 respectively. The skewness of the population data is positive value suggesting a relative skewness in the data (Malash and El-Khaiary, 2010, p. 21).

Table 3: Descriptive Statistics of “Total Medals”

*Descriptive Statistics (Total Medals)*

Mean	5.07496
Standard Error	0.45672
Median	0
Mode	0
Standard Deviation	16.17332
Sample Variance	261.5762
Kurtosis	44.18286
Skewness	5.948003
Range	195
Minimum	0
Maximum	195
Sum	6364
Count	1254
Confidence Level (95.0%)	0.896021

Estimate the following simple regression model of total medals on real GDP.

TotalMedals=β_₀₊ β_₁realGDP + u

Write down the sample regression function and interpret the coefficient estimates.

Solution

The regression model of the total medals on real GDP is given as;

TotalMedals =β_₀₊ β_₁realGDP+u

Where, TotalMedals = Dependent variable of the model

β_{_{0 =}}_{constant term or the y-intercept}

realGDP = independent variable

β_{_{1 =}}_{coefficient of real GDP}

u = the error term

By estimating the simple regression model of total medals on real GDP, the following excel output is produced;

SUMMARY OUTPUT

Regression Statistics
Multiple R	0.6445
R Square	0.41538
Adjusted R Square	0.414913
Standard Error	12.37113
Observations	1254

ANOVA
	df	SS	MS	F	Significance F
Regression	1	136142.9	136142.9	889.5624	4E-148
Residual	1252	191612.1	153.0448
Total	1253	327755

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	2.458184	0.360198	6.824527	1.37E-11	1.751525	3.164843	1.751525	3.164843
Real GDP	1.9E-05	6.37E-07	29.82553	4E-148	1.78E-05	2.02E-05	1.78E-05	2.02E-05

Based on the results above, the coefficients; β_{_{0 =}}2.458184 (constant term) and β_₁= 1.9E-05 (coefficient of the realGDP).

Thus the sample regression function is; TotalMedals = 2.458184 _₊ 1.9E-05(realGDP) + u

Now estimate the following simple regression model with a level-log specification,

TotalMedals=β_₀₊ β_₁log (realGDP) + u

Solution

By estimating the simple regression model with a level-log specification, the result of the model is obtained as given in the excel output below.

SUMMARY OUTPUT

Regression Statistics
Multiple R	0.48035
R Square	0.230736
Adjusted R Square	0.230122
Standard Error	14.19091
Observations	1254

ANOVA
	df	SS	MS	F	Significance F
Regression	1	75624.95	75624.95	375.5302	2.26E-73
Residual	1252	252130	201.3818
Total	1253	327755

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	-27.0712	1.706567	-15.863	8.89E-52	-30.4193	-23.7232	-30.4193	-23.7232
log(real GDP)	3.443017	0.177671	19.3786	2.26E-73	3.094451	3.791583	3.094451	3.791583

Based on the above results, the coefficients are obtained as; β_₀ = -27.0712, β_₁= 3.443017.

Report your regression results in a sample regression function

The regression function of the model with a level -log specification can thus be written as;

TotalMedals =_{_–}27.0712 _₊ 3.443017log (realGDP) + u

Interpret the estimated coefficient of log (realGDP).What did you expect this coefficient to be before the estimation and is the sign of this estimate what you expect it to be? Provide an explanation.

The estimated coefficient of log (realGDP) according the results is β_₁= 3.443017. The coefficient is positive value thus indicating a positive relation between the Total medals earned by a country and the real Gross Domestic Product (GDP). According to me, this coefficient ought to be a positive value and indeed after the estimation, the results confirms this. This is because the total medals earned may only be positively contributed or associated with the real GDP and not otherwise since GDP is a continuous variable (Barreto, 2015)

A model that relates the total number of medals to the realGDP and population is:

TotalMedals=β_₀₊ β_₁realGDP+ β_₂population+u

Report your results in a sample regression function. What can you conclude regarding comparison of the goodness of fit of this regression model versus the regression model in part (ii)?

Solution

SUMMARY OUTPUT

Regression Statistics
Multiple R	0.660607
R Square	0.436402
Adjusted R Square	0.435501
Standard Error	12.15153
Observations	1254

ANOVA
	df	SS	MS	F	Significance F
Regression	2	143032.8	71516.39	484.3327	1.7E-156
Residual	1251	184722.2	147.6596
Total	1253	327755

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	1.911994	0.362727	5.271159	1.6E-07	1.200373	2.623615	1.200373	2.623615
Real GDP	1.77E-05	6.53E-07	27.18071	3.1E-128	1.65E-05	1.9E-05	1.65E-05	1.9E-05
Population	0.026154	0.003829	6.830858	1.31E-11	0.018643	0.033666	0.018643	0.033666

Based on the above results, the coefficients are obtained as; β_₀ = 1.911994, β_₁= 1.77E-05 and β_₂= 0.026154. The regression function of the model can thus be given as;

TotalMedals= 1.911994 _₊ 1.77E-05realGDP+ 0.026154Population +u

Where u according to (Reed, Kaplan and Brewer, 2012, p. 54) is the error term to the model.

In regard to the goodness of fit of this regression model versus the regression model in part (ii), this regression model has a better fit than that of (ii). That is; R square of 0.436402 i.e. 43% as compared to 0.41538 of the (ii) above.

Now re-estimate the equation in (IV) but using the log of independent variables. That is, estimate the model,

TotalMedals =β_₀₊ β_₁log (realGDP) + β_₂log (population) +u

Report the results in a sample regression function. Interpret the coefficient of population. Test whether it is statistically significant at 1% level.

Solution

SUMMARY OUTPUT

Regression Statistics
Multiple R	0.480379
R Square	0.230764
Adjusted R Square	0.229534
Standard Error	14.19632
Observations	1254

ANOVA
	df	SS	MS	F	Significance F
Regression	2	75633.92	37816.96	187.6441	5.38E-72
Residual	1251	252121	201.5356
Total	1253	327755

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	-27.3506	2.160388	-12.66	1.19E-34	-31.5889	-23.1122	-31.5889	-23.1122
log(Real GDP)	3.48385	0.262752	13.25909	1.22E-37	2.968367	3.999332	2.968367	3.999332
log(Population)	-0.06139	0.290937	-0.21101	0.832918	-0.63217	0.50939	-0.63217	0.50939

Based on the above results, the coefficients are obtained as; β_₀ = -27.3506, β_₁= 3.48385 and β_₂= -0.06139. The regression function of the model can thus be written as;

TotalMedals = -27.3506 _₊ 3.48385 log (realGDP) + -0.06139log (population) +u

Using the estimated model in (v), test whether realGDP has a positive effect on total medals at 1% level of significance.

By using the model; TotalMedals=-27.3506 _₊ 3.48385 log (realGDP) + -0.06139log (Population) +u, we can perform a hypothesis test of the “significance of the correlation coefficient” to decide whether there exist a positive effect on total medals at 1% level of significance by considering the p-value (Barati, 2013).

Null hypothesis; β_₁= 0

Alternate Hypothesis: β_₁≠ 0

In the model above, the p value is obtained to be 5.38E-72. This value is less than critical value at 1% level of significance and thus we reject the null hypothesis hence we can conclude that real GDP has a positive effect on total medals earned by a country.

Add the variables “planned economy” and “host country” to the level-log equation in (v) and estimate the following model.

TotalMedals=β_₀₊ β_₁log (realGDP) + β_₂log (population) + β_₃plannedeconomy+ β_₄hostcountry+ u

Solution

SUMMARY OUTPUT

Regression Statistics
Multiple R	0.544554
R Square	0.29654
Adjusted R Square	0.294287
Standard Error	13.58668
Observations	1254

ANOVA
	df	SS	MS	F	Significance F
Regression	4	97192.3	24298.07	131.6271	7.43E-94
Residual	1249	230562.7	184.5978
Total	1253	327755

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	-24.7138	2.082675	-11.8664	7.5E-31	-28.7997	-20.6279	-28.7997	-20.6279
log(Real GDP)	3.155271	0.253352	12.45408	1.21E-33	2.658227	3.652314	2.658227	3.652314
log(Population)	-0.06609	0.27944	-0.2365	0.813083	-0.61431	0.482137	-0.61431	0.482137
PlannedEconomy	3.966477	3.083846	1.286211	cc	-2.08361	10.01657	-2.08361	10.01657
HostCountry	47.10333	4.378378	10.75817	7.09E-26	38.51354	55.69312	38.51354	55.69312

The model can thus be written as;

TotalMedals=-24.7138_₊ 3.155271 log (realGDP) + -0.06609 log (population) +3.966477 plannedeconomy+47.10333 hostcountry+ u

Test whether planned economy variable and host country variables are individually significant at 1% level?

In this case we perform a hypothesis testing to determine if the two variables are individually significant at 1% level of significance.

The null and alternate hypotheses are thus stated as follows;

Null hypothesis: H₀ = β₃ = β₄

Alternate hypothesis: H₁= β₃ ≠ β₄

`Here we test whether planned economy and host country are individually significant by performing a t test.

Example: H₀: β₃ = β₄against H₁: β₃ ≠ β₄ at significance level α = .01.

Then
t = (b₂ – H₀ value of β₁) / (standard error of b₂)
= (12.33647 – 1.0) / 1.41270 = 11.09412

By using the p-value approach, t-value = 2.579759. Thus we need to reject the null hypothesis and conclude that both planned economy and host country are not individually independence at 1% significance level.

Also, by performing a t test for this in excel, the following result is obtained;

t-Test: Two-Sample Assuming Unequal Variances

	Planned Economy	Host Country
Mean	5.07496	0.007974
Variance	261.5762	0.007917
Observations	1254	1254
Hypothesized Mean Difference	0
df	1253
t Stat	11.09412
P(T<=t) one-tail	1.2E-27
t Critical one-tail	2.329328
P(T<=t) two-tail	2.4E-27
t Critical two-tail	2.579759

Test if plannedeconomy and hostcountry variables are jointly significant at 5% level?

We test H₀: β₃ = 0 and β₄ ≠ 0 versus Ha: at least one of β₁ and β₂ does not equal zero.

From the ANOVA table the F-test statistic is 0.504052 with p-value of 0.91122.
Since the p-value is not less than 0.05 we do not reject the null hypothesis and hence conclude that both planned economy and host country are jointly statistically significance at 5% level.

The excel output for the analysis also give the results as shown in the table below

F-Test Two-Sample for Variances

	Planned Economy	Host Country
Mean	0.007974	0.015949
Variance	0.007917	0.015707
Observations	1254	1254
df	1253	1253
F	0.504052
P(F<=f) one-tail	0
F Critical one-tail	0.91122

Test the overall significance of the model you estimated in part (vii) at 1% level of significance.

We test H₀: β₂ = 0 and β₃ = 0 versus H₁: at least one of β₂ and β₃ does not equal zero. From the ANOVA table the F-test statistic is 131.6271 with p-value of 7.43E-94.
Since the p-value is less than 0.01 we reject the null hypothesis that the regression parameters are zero at significance level 0.01.

Suppose you want to test whether Soviet Union Member countries win more medals than other countries. Specify a regression model that will enable you to test such a hypothesis using the model in (v) as a base. Report your results in a sample regression function and perform the hypothesis test at 5% level of significance. What would you infer?

SUMMARY OUTPUT

Regression Statistics
Multiple R	0.544554
R Square	0.29654
Adjusted R Square	0.294287
Standard Error	13.58668
Observations	1254

ANOVA
	df	SS	MS	F	Significance F
Regression	4	97192.3	24298.07	131.6271	7.43E-94
Residual	1249	230562.7	184.5978
Total	1253	327755

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	-24.7138	2.082675	-11.8664	7.5E-31	-28.7997	-20.6279	-28.7997	-20.6279
log(Real GDP)	3.155271	0.253352	12.45408	1.21E-33	2.658227	3.652314	2.658227	3.652314
log(Population)	-0.06609	0.27944	-0.2365	0.813083	-0.61431	0.482137	-0.61431	0.482137
PlannedEconomy	3.966477	3.083846	1.286211	cc	-2.08361	10.01657	-2.08361	10.01657
HostCountry	47.10333	4.378378	10.75817	7.09E-26	38.51354	55.69312	38.51354	55.69312

Hypothesis testing;

We test H₀: β₁ = 0 and β₂ ≥ 0 versus Ha: at least one of β₁ and β₂ does not equal zero.

From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975.
Since the p-value is not less than 0.05 we do not reject the null hypothesis that the regression parameters are zero at significance level 0.05 (Hilbe, 2009).
Conclude that the parameters are jointly statistically insignificant at significance level 0.05.

References

Barati, R., (2013). Application of excel solver for parameter estimation of the nonlinear Muskingum models. KSCE Journal of Civil Engineering, 17(5), pp.1139-1148.

Barreto, H., (2015). Why Excel? The Journal of Economic Education, 46(3), pp.300-309.

Hilbe, J.M., (2009). Logistic regression models. Chapman and hall/CRC.

Hill, R.C., Griffiths, W.E. and Lim, G.C., (2018). Principles of econometrics. John Wiley & Sons.

Little, T.D., Deboeck, P. and Wu, W., (2015). Longitudinal data analysis. Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource, pp.1-17.

Malash, G.F. and El-Khaiary, M.I., (2010). Piecewise linear regression: A statistical method for the analysis of experimental adsorption data by the intraparticle-diffusion models. Chemical Engineering Journal, 163(3), pp.256-263.

Reed, D.D., Kaplan, B.A. and Brewer, A.T., (2012). A tutorial on the use of Excel 2010 and Excel for Mac 2011 for conducting delay‐discounting analyses. Journal of applied behavior analysis, 45(2), pp.375-386.

Wilson, J.H., Keating, B.P. and Beal, M., (2015). Regression analysis: understanding and building business and economic models using Excel. Business Expert Press.

Related Posts

Game Theory-2290432

Presentation Of Aids In Movies And Television In Past-2273859

Group Counselling For People With Addiction-2280984

About admin