All that is Economics: The dummy variable regression model

THE NATURE OF DUMMY VARIABLES:

In regression analysis, the dependent variable or regressand is frequently influenced not only by ratio scale variables but also by variables that are essentially qualitative in scale; like nature, sex, colour, qualifications, religions, etc. For example, it is observed in many countries that female workers are paid less for the same job as male workers. If such details are not considered in regression analysis, the results can vary and misguide us.

One way we can quantify the qualitative attributes is by constructing artificial variables that take on values of 1 or 0, 1 indicating the presence of that attribute and 0-indicating its absence. Such variables are thus essentially a device to classify data into mutually exclusive categories such as male and female.

ANOVA MODEL

ANOVA models are used to assess the statistical significance of the relationship between a quantitative regressand and qualitative or dummy regressors. They are often used to compute and compare the differences in the mean values of two or more groups or categories, and are therefore more general than the t-test which can be used to compare means of two groups or categories only.

The ANOVA model equation would look like this-

Y_i=β₁+β₂D_2i+u_i

Where, Y_i=annual starting salary

D=1-if college graduate

D=0- if otherwise

Mean starting salary of non college graduate=

E(Yi/D=0)=β₁+β₂(0)

=β₁

Mean Starting Salary of college Graduate=

E(Yi/D=1)= β₁+β₂(1)

= β₁+β₂

ANCOVA MODEL

ANOVA model of the type discussed before is common in fields such as sociology, psychology, education, and market research. However, it is not that common in Economics. Typically in most economic research, a regression model contains some explanatory variables that are quantitative and some that are qualitative. Regression models containing a mix of quantitative and qualitative variables are ANCOVA models-Analysis of Co-Variance Models. ANCOVA models are an extension of the ANOVA models in that they provide a method of statistically controlling the effects of Quantitative regressors, called co-variates or control variables in a model that includes both quantitative and qualitative or dummy regressors.

Consider an example very quantitative and qualitative variables are used

Y_i=β₁+β₂D_2i+β₃X_i+u_i

Y_i=annual salary of college teachers

X_i=years of teaching experience

D_i=1-if male

=0-if female

Assuming E(ui)=0;

Mean salary of a female college teacher=β₁+β₃X_i

Mean salary of a male college teacher =β₁+β₂+β₃X_i

FEATURES:

1) We’ve used only one dummy variable to distinguish two categories-Male and Female. If the no of qualitative variables are ‘m’, dummy variables will be (m-1)

2) Assigning of 1 or 0 values to two categories is arbitrary

3) The category which is assigned value 0 often referred to as bench mark category- In the above example, female salary is the base salary-which is the main intercept term.

4) The coefficient β2 attached to Dummy variables D is called ‘Differential Intercept Co-efficient’.

COMPARING TWO REGRESSION MODELS:

Qualitative or quantitative analysis is majorly done to see the change in given time series, or before and after certain policy implication. In such cases, two regression functions need to be compared.

In case of India, there has been a great hype about change in consumption and saving pattern after the Economic Policy Implication of 1990s. To compare the change, Data pre and post reform need to be analyzed separately.

1950-1991-Pre Reform Period

1991-2008-Post Reform Period

Pre-Reform Period: 1950-1991

Y_t^*=A₁+A₂X_t+U_{1t (}eq-1)

Y_t=B₁+B₂X_t+U_2t (eq-2)

Here, Y= savings

X= income

U=error term

A1-pre reform consumption intercepts

B1-post reform consumption intercept

A2-pre reform MPC

B2-post reform MPC

Equation (1) and (2) have four possibilities:

fig 1

(1)A1=B1 and A2=B2 i.e. two regressions (1) and (2) are identical. This is the case of “coincidental regression"

(2) A1#b1 and a2=B2 i.e. two regressions differ only in location, their intercept. This is the case of “Parallel Regression”.

fig 2

fig 3

fig 4

(3)-A1=B1, A2#B2- two regressions have same intercepts and different slopes. It is the case of “Concurrent Regression”.

(4) A1#B1, A2#B2- two regressions are totally different. This is the case of dissimilar regression.

REGRESSION MODEL FOR THE TWO COMPARED REGRESSIONS:

Y_t=C₁+C₂+C₃X_t+C₄D_tX_t+U_t

Here, Y=savings, x=income, D_t=1 for observations from 1992-2008

D_t=0 for observations before 1992

Thus E (Y_t/D_t=0, X_t) =C₁+C₃X_t(eq-3)

E (Y_t/D_t=1, X_t) =(C₁+C₂)+(C₃+C₄)X_t(eq-4)

Here, Eq(1)=Eq(3) and Eq(2)=Eq(4);

As A₁ =C₁ and A₂ =C₂;

B₁=(c₁+C₂), and B2(C₃+C₄)

U_t is ignored as per the assumption,

So, C₂= Differential intercept

And C₄=Slope Coefficient

SUMMARY AND CONCLUSIONS

1) Dummy Variables taking values 1 and 0 are a means of introducing qualitative regressors in a regression model

2) Dummy Variables are a data classifying device in that the divide samples into various sub groups based on their qualities. If there are differences in them, they will be reflected in differences by running sub group regressions

3) Although a versatile tool, it needs to be handles carefully- (i) if regression contains a constant term, no. of dummy variables must be 1 less than the no. of classifications. (ii) the coefficient attached to the dummy variable must always be interpreted in relation to the base or reference group-i.e. the group that receives the value of 0. (iii) if a model has several qualitative variables with several classes, introducing dummy variables can consume large no. of degrees of freedom. Therefore, they need to be carefully chosen.

All that is Economics

Saturday, 22 September 2012

The dummy variable regression model

No comments:

Post a Comment