Analytical Methods in Economics and Finance: 1045574

Regression Models using Cross Section Data

Use the data set in DATA_ASSIGNMENT which contains information on number of medals won by each country between 1960 and 1999 in the Olympic Games and the characteristics of these countries. Country ID is the country identifier. Year denotes the year when the Olympics games held. Real GDP is the Real Gross Domestic Product of a country in millions of dollars. Population is the number of people living in a country in millions of people. Total Medals in the sum of gold, silver and bronze medals won by a country. Host Country is a dummy variable that takes the value 1 if the country is hosting the Olympic Games and takes the value 0 if the country is not hosting the games. Planned Economy is a dummy variable that takes the value 1 if the country is a planned economy and is not a member of Soviet Union and 0 otherwise. Soviet Union Member is a dummy variable that takes the value 1 if the country is a member of Soviet Union and takes the value 0 if the country is not a member.

Questions

  • Present the descriptive statistics of the variables Real GDP, Population, Total Medals. Comment on the means and measures of dispersion of the variables.

Solution

  • Real Gross Domestic Product (GDP)

The descriptive statistics of the Real Gross Domestic Product in millions of dollars is given in table 1 below. The mean, median, variance and standard deviation are 137726.658, 9110, 3.00986E+11 and 548622.1516 respectively. The skewness of the real gross domestic data is 8.0229. This is a positive value indicating that the data is positively skewed (Little, Deboek and Wu, 2015, p. 35).

Table 1: Descriptive Statistics (GDP-Millions of Dollars)

 Descriptive Statistics   (GDP millions of dollars)
  
Mean137726.658
Standard Error15492.60937
Median9110
Mode1100
Standard Deviation548622.1516
Sample Variance3.00986E+11
Kurtosis76.05248117
Skewness8.022897218
Range7279954
Minimum46
Maximum7280000
Sum172709229.2
Count1254
Confidence Level (95.0%)30394.31601
  • Population (Millions of People)

The descriptive statistics of the Population in millions is given as shown in the table 2 below. The mean, median, variance and standard deviation are 27.53976778, 2.640256344, 8741.575765 and 7.020635128 respectively. The skewness of the population data is positive value suggesting a relative skewness in the data (Little, Deboeck and Wu, 2015, p.49).

Table 2: Descriptive Statistics of “Population” in millions

 Descriptive Statistics (Population in millions)
  
Mean27.53976778
Standard Error2.640256344
Median7.020635128
Mode0.02
Standard Deviation93.4963944
Sample Variance8741.575765
Kurtosis86.37505268
Skewness8.618197335
Range1219.98504
Minimum0.01496041
Maximum1220
Sum34534.8688
Count1254
Confidence Level (95.0%)5.17981082
  • Total Medals

The descriptive statistics of the total medals earned by a given country is given as shown in the table 3 below. The mean, median and standard deviation are 5.07496, 0 and 16.17332 respectively. The skewness of the population data is positive value suggesting a relative skewness in the data (Malash and El-Khaiary, 2010, p. 21).

Table 3: Descriptive Statistics of “Total Medals”

 Descriptive Statistics (Total Medals)
  
Mean5.07496
Standard Error0.45672
Median0
Mode0
Standard Deviation16.17332
Sample Variance261.5762
Kurtosis44.18286
Skewness5.948003
Range195
Minimum0
Maximum195
Sum6364
Count1254
Confidence Level (95.0%)0.896021
  • Estimate the following simple regression model of total medals on real GDP.

TotalMedals=β0+ β1realGDP + u

Write down the sample regression function and interpret the coefficient estimates.  

Solution

 The regression model of the total medals on real GDP is given as;

TotalMedals =β0+ β1realGDP+u

Where, TotalMedals = Dependent variable of the model

β0 = constant term or the y-intercept

realGDP = independent variable

β1 =   coefficient of real GDP

u = the error term

By estimating the simple regression model of total medals on real GDP, the following excel output is produced;

SUMMARY OUTPUT       
         
Regression Statistics       
Multiple R0.6445       
R Square0.41538       
Adjusted R Square0.414913       
Standard Error12.37113       
Observations1254       
         
ANOVA        
 dfSSMSFSignificance F   
Regression1136142.9136142.9889.56244E-148   
Residual1252191612.1153.0448     
Total1253327755      
         
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept2.4581840.3601986.8245271.37E-111.7515253.1648431.7515253.164843
Real GDP 1.9E-056.37E-0729.825534E-1481.78E-052.02E-051.78E-052.02E-05

Based on the results above, the coefficients; β0 = 2.458184 (constant term) and β1 = 1.9E-05 (coefficient of the realGDP).

Thus the sample regression function is; TotalMedals = 2.458184 + 1.9E-05(realGDP) + u

  • Now estimate the following simple regression model with a level-log specification,

TotalMedals=β0+ β1log (realGDP) + u

Solution

By estimating the simple regression model with a level-log specification, the result of the model is obtained as given in the excel output below.

SUMMARY OUTPUT        
         
Regression Statistics       
Multiple R0.48035       
R Square0.230736       
Adjusted R Square0.230122       
Standard Error14.19091       
Observations1254       
         
ANOVA        
 dfSSMSFSignificance F   
Regression175624.9575624.95375.53022.26E-73   
Residual1252252130201.3818     
Total1253327755      
         
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept-27.07121.706567-15.8638.89E-52-30.4193-23.7232-30.4193-23.7232
log(real GDP)3.4430170.17767119.37862.26E-733.0944513.7915833.0944513.791583

Based on the above results, the coefficients are obtained as;  β0 = -27.0712, β1 = 3.443017.

Report your regression results in a sample regression function

The regression function of the model with a level -log specification can thus be written as;

TotalMedals = –27.0712 + 3.443017log (realGDP) + u

Interpret the estimated coefficient of log (realGDP).What did you expect this coefficient to be before the estimation and is the sign of this estimate what you expect it to be? Provide an explanation.

The estimated coefficient of log (realGDP) according the results is β1 = 3.443017. The coefficient is positive value thus indicating a positive relation between the Total medals earned by a country and the real Gross Domestic Product (GDP). According to me, this coefficient ought to be a positive value and indeed after the estimation, the results confirms this. This is because the total medals earned may only be positively contributed or associated with the real GDP and not otherwise since GDP is a continuous variable (Barreto, 2015)

  • A model that relates the total number of medals to the realGDP and population is:

TotalMedals=β0+ β1realGDP+ β2population+u

Report your results in a sample regression function. What can you conclude regarding comparison of the goodness of fit of this regression model versus the regression model in part (ii)?

Solution

SUMMARY OUTPUT      
         
Regression Statistics       
Multiple R0.660607       
R Square0.436402       
Adjusted R Square0.435501       
Standard Error12.15153       
Observations1254       
         
ANOVA        
 dfSSMSFSignificance F   
Regression2143032.871516.39484.33271.7E-156   
Residual1251184722.2147.6596     
Total1253327755      
         
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1.9119940.3627275.2711591.6E-071.2003732.6236151.2003732.623615
Real GDP1.77E-056.53E-0727.180713.1E-1281.65E-051.9E-051.65E-051.9E-05
Population0.0261540.0038296.8308581.31E-110.0186430.0336660.0186430.033666

Based on the above results, the coefficients are obtained as; β0 = 1.911994, β1 = 1.77E-05 and β2 = 0.026154. The regression function of the model can thus be given as;

TotalMedals= 1.911994 + 1.77E-05realGDP+ 0.026154Population +u

  Where u according to (Reed, Kaplan and Brewer, 2012, p. 54) is the error term to the model.

In regard to the goodness of fit of this regression model versus the regression model in part (ii), this regression model has a better fit than that of (ii). That is; R square of 0.436402 i.e. 43% as compared to 0.41538 of the (ii) above.

  • Now re-estimate the equation in (IV) but using the log of independent variables. That is, estimate the model,  

TotalMedals 0+ β1log (realGDP) + β2log (population) +u

Report the results in a sample regression function. Interpret the coefficient of population. Test whether it is statistically significant at 1% level.  

Solution

SUMMARY OUTPUT       
         
Regression Statistics       
Multiple R0.480379       
R Square0.230764       
Adjusted R Square0.229534       
Standard Error14.19632       
Observations1254       
         
ANOVA        
 dfSSMSFSignificance F   
Regression275633.9237816.96187.64415.38E-72   
Residual1251252121201.5356     
Total1253327755      
         
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept-27.35062.160388-12.661.19E-34-31.5889-23.1122-31.5889-23.1122
log(Real GDP)3.483850.26275213.259091.22E-372.9683673.9993322.9683673.999332
log(Population)-0.061390.290937-0.211010.832918-0.632170.50939-0.632170.50939

Based on the above results, the coefficients are obtained as; β0 = -27.3506, β1 = 3.48385 and β2 = -0.06139. The regression function of the model can thus be written as;

TotalMedals = -27.3506 + 3.48385 log (realGDP) + -0.06139log (population) +u

  • Using the estimated model in (v), test whether realGDP has a positive effect on total medals at 1% level of significance.

By using the model; TotalMedals=-27.3506 + 3.48385 log (realGDP) + -0.06139log (Population) +u, we can perform a hypothesis test of the “significance of the correlation coefficient” to decide whether there exist a positive effect on total medals at 1% level of significance by considering the p-value (Barati, 2013).

Null hypothesis; β1 = 0

 Alternate Hypothesis: β1 ≠ 0

In the model above, the p value is obtained to be 5.38E-72. This value is less than critical value at 1% level of significance and thus we reject the null hypothesis hence we can conclude that real GDP has a positive effect on total medals earned by a country.

  • Add the variables “planned economy” and “host country” to the level-log equation in (v) and estimate the following model.

TotalMedals0+ β1log (realGDP) + β2log (population) + β3plannedeconomy+ β4hostcountry+ u

Solution

SUMMARY OUTPUT       
         
Regression Statistics       
Multiple R0.544554       
R Square0.29654       
Adjusted R Square0.294287       
Standard Error13.58668       
Observations1254       
         
ANOVA        
 dfSSMSFSignificance F   
Regression497192.324298.07131.62717.43E-94   
Residual1249230562.7184.5978     
Total1253327755      
         
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept-24.71382.082675-11.86647.5E-31-28.7997-20.6279-28.7997-20.6279
log(Real GDP)3.1552710.25335212.454081.21E-332.6582273.6523142.6582273.652314
log(Population)-0.066090.27944-0.23650.813083-0.614310.482137-0.614310.482137
PlannedEconomy3.9664773.0838461.286211cc-2.0836110.01657-2.0836110.01657
HostCountry47.103334.37837810.758177.09E-2638.5135455.6931238.5135455.69312

The model can thus be written as;

 TotalMedals=-24.7138+ 3.155271 log (realGDP) + -0.06609 log (population) +3.966477 plannedeconomy+47.10333 hostcountry+ u

Test whether planned economy variable and host country variables are individually significant at 1% level?

In this case we perform a hypothesis testing to determine if the two variables are individually significant at 1% level of significance.

The null and alternate hypotheses are thus stated as follows;

Null hypothesis: H0 = β3 = β4

Alternate hypothesis: H1= β3 ≠ β4

`Here we test whether planned economy and host country are individually significant by performing a t test.

Example: H0: β3 = β4 against H1: β3 ≠ β4 at significance level α = .01.

Then 
  t = (b2 – H0 value of β1) / (standard error of b2)
    = (12.33647 – 1.0) / 1.41270    = 11.09412

By using the p-value approach, t-value = 2.579759. Thus we need to reject the null hypothesis and conclude that both planned economy and host country are not individually independence at 1% significance level.

Also, by performing a t test for this in excel, the following result is obtained;

t-Test: Two-Sample Assuming Unequal Variances
   
 Planned Economy Host Country
Mean5.074960.007974
Variance261.57620.007917
Observations12541254
Hypothesized Mean Difference0 
df1253 
t Stat11.09412 
P(T<=t) one-tail1.2E-27 
t Critical one-tail2.329328 
P(T<=t) two-tail2.4E-27 
t Critical two-tail2.579759 

Test if plannedeconomy and hostcountry variables are jointly significant at 5% level?

We test H0: β3 = 0 and β4 ≠ 0 versus Ha: at least one of β1 and β2 does not equal zero. 

From the ANOVA table the F-test statistic is 0.504052 with p-value of 0.91122. 
Since the p-value is not less than 0.05 we do not reject the null hypothesis and hence conclude that both planned economy and host country are jointly statistically significance at 5% level.

The excel output for the analysis also give the results as shown in the table below

F-Test Two-Sample for Variances
   
 Planned Economy Host Country
Mean0.0079740.015949
Variance0.0079170.015707
Observations12541254
df12531253
F0.504052 
P(F<=f) one-tail0 
F Critical one-tail0.91122 

Test the overall significance of the model you estimated in part (vii) at 1% level of significance.

We test H0: β2 = 0 and β3 = 0 versus H1: at least one of β2 and β3 does not equal zero. From the ANOVA table the F-test statistic is 131.6271 with p-value of 7.43E-94. 
Since the p-value is less than 0.01 we reject the null hypothesis that the regression parameters are zero at significance level 0.01. 

  • Suppose you want to test whether Soviet Union Member countries win more medals than other countries. Specify a regression model that will enable you to test such a hypothesis using the model in (v) as a base. Report your results in a sample regression function and perform the hypothesis test at 5% level of significance. What would you infer?
SUMMARY OUTPUT       
         
Regression Statistics       
Multiple R0.544554       
R Square0.29654       
Adjusted R Square0.294287       
Standard Error13.58668       
Observations1254       
         
ANOVA        
 dfSSMSFSignificance F   
Regression497192.324298.07131.62717.43E-94   
Residual1249230562.7184.5978     
Total1253327755      
         
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept-24.71382.082675-11.86647.5E-31-28.7997-20.6279-28.7997-20.6279
log(Real GDP)3.1552710.25335212.454081.21E-332.6582273.6523142.6582273.652314
log(Population)-0.066090.27944-0.23650.813083-0.614310.482137-0.614310.482137
PlannedEconomy3.9664773.0838461.286211cc-2.0836110.01657-2.0836110.01657
HostCountry47.103334.37837810.758177.09E-2638.5135455.6931238.5135455.69312

Hypothesis testing;

We test H0: β1 = 0 and β2 ≥ 0 versus Ha: at least one of β1 and β2 does not equal zero. 

From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975. 
Since the p-value is not less than 0.05 we do not reject the null hypothesis that the regression parameters are zero at significance level 0.05 (Hilbe, 2009). 
Conclude that the parameters are jointly statistically insignificant at significance level 0.05. 

References

Barati, R., (2013). Application of excel solver for parameter estimation of the nonlinear Muskingum models. KSCE Journal of Civil Engineering17(5), pp.1139-1148.

Barreto, H., (2015). Why Excel? The Journal of Economic Education46(3), pp.300-309.

Hilbe, J.M., (2009). Logistic regression models. Chapman and hall/CRC.

Hill, R.C., Griffiths, W.E. and Lim, G.C., (2018). Principles of econometrics. John Wiley & Sons.

Little, T.D., Deboeck, P. and Wu, W., (2015). Longitudinal data analysis. Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource, pp.1-17.

Malash, G.F. and El-Khaiary, M.I., (2010). Piecewise linear regression: A statistical method for the analysis of experimental adsorption data by the intraparticle-diffusion models. Chemical Engineering Journal163(3), pp.256-263.

Reed, D.D., Kaplan, B.A. and Brewer, A.T., (2012). A tutorial on the use of Excel 2010 and Excel for Mac 2011 for conducting delay‐discounting analyses. Journal of applied behavior analysis45(2), pp.375-386.

Wilson, J.H., Keating, B.P. and Beal, M., (2015). Regression analysis: understanding and building business and economic models using Excel. Business Expert Press.