Saturday, 22 September 2012

The dummy variable regression model

THE NATURE OF DUMMY VARIABLES:

In regression analysis, the dependent variable or regressand is frequently influenced not only by ratio scale variables but also by variables that are essentially qualitative in scale; like nature, sex, colour, qualifications, religions, etc. For example, it is observed in many countries that female workers are paid less for the same job as male workers. If such details are not considered in regression analysis, the results can vary and misguide us.

One way we can quantify the qualitative attributes is by constructing artificial variables that take on values of 1 or 0, 1 indicating the presence of that attribute and 0-indicating its absence. Such variables are thus essentially a device to classify data into mutually exclusive categories such as male and female.

ANOVA MODEL

ANOVA models are used to assess the statistical significance of the relationship between a quantitative regressand and qualitative or dummy regressors. They are often used to compute and compare the differences in the mean values of two or more groups or categories, and are therefore more general than the t-test which can be used to compare means of two groups or categories only.

The ANOVA model equation would look like this-

Yi12D2i+ui

Where, Yi=annual starting salary
D=1-if college graduate
D=0- if otherwise

Mean starting salary of non college graduate=

E(Yi/D=0)=β12(0)
                 1

Mean Starting Salary of college Graduate=

E(Yi/D=1)= β12(1)
                 = β12
                 
ANCOVA MODEL

ANOVA model of the type discussed before is common in fields such as sociology, psychology, education, and market research. However, it is not that common in Economics. Typically in most economic research, a regression model contains some explanatory variables that are quantitative and some that are qualitative. Regression models containing a mix of quantitative and qualitative variables are ANCOVA models-Analysis of Co-Variance Models. ANCOVA models are an extension of the ANOVA models in that they provide a method of statistically controlling the effects of Quantitative regressors, called co-variates or control variables in a model that includes both quantitative and qualitative or dummy regressors.

Consider an example very quantitative and qualitative variables are used


Yi12D2i3Xi+ui

Yi=annual salary of college teachers
Xi=years of teaching experience
Di=1-if male
   =0-if female

Assuming E(ui)=0;

Mean salary of a female college teacher=β13Xi
Mean salary of a male college teacher  123Xi



FEATURES:

1)      We’ve used only one dummy variable to distinguish two categories-Male and Female. If the no of qualitative variables are ‘m’, dummy variables will be (m-1)
2)      Assigning of 1 or 0 values to two categories is arbitrary
3)      The category which is assigned value 0 often referred to as bench mark category- In the above example, female salary is the base salary-which is the main intercept term.
4)      The coefficient β2 attached to Dummy variables D is called ‘Differential Intercept Co-efficient’.

COMPARING TWO REGRESSION MODELS:

Qualitative or quantitative analysis is majorly done to see the change in given time series, or before and after certain policy implication. In such cases, two regression functions need to be compared.

In case of India, there has been a great hype about change in consumption and saving pattern after the Economic Policy Implication of 1990s. To compare the change, Data pre and post reform need to be analyzed separately.

1950-1991-Pre Reform Period
1991-2008-Post Reform Period

Pre-Reform Period: 1950-1991

Yt*=A1+A2Xt+U1t    (eq-1)                                                                

Yt=B1+B2Xt+U2t    (eq-2)

Here, Y= savings
           X= income
          U=error term
A1-pre reform consumption intercepts
B1-post reform consumption intercept

A2-pre reform MPC
B2-post reform MPC

Equation (1) and (2) have four possibilities:


fig 1

            (1)A1=B1 and A2=B2 i.e. two regressions (1) and (2) are identical. This is the case of “coincidental regression"
 (2) A1#b1 and a2=B2 i.e. two regressions differ only in location, their intercept. This is the case of “Parallel Regression”.


fig 2
 fig 3

                               fig 4
                                                                                                                     

(3)-A1=B1, A2#B2- two regressions have same intercepts and different slopes. It is the case of  “Concurrent Regression”.

(4) A1#B1, A2#B2- two regressions are totally different. This is the case of dissimilar regression.

REGRESSION MODEL FOR THE TWO COMPARED REGRESSIONS:

Yt=C1+C2+C3Xt+C4DtXt+Ut

Here, Y=savings, x=income, Dt=1 for observations from 1992-2008
                                                Dt=0 for observations before 1992

Thus E (Yt/Dt=0, Xt) =C1+C3Xt      (eq-3)
        E (Yt/Dt=1, Xt) =(C1+C2)+(C3+C4)Xt        (eq-4)

Here, Eq(1)=Eq(3) and Eq(2)=Eq(4);

 As A1 =C1 and A2 =C2;

B1=(c1+C2), and B2(C3+C4)

Ut is ignored as per the assumption,

So, C2= Differential intercept
And C4=Slope Coefficient



SUMMARY AND CONCLUSIONS

1)      Dummy Variables taking values 1 and 0 are a means of introducing qualitative regressors in a regression model
2)      Dummy Variables are a data classifying device in that the divide samples into various sub groups based on their qualities. If there are differences in them, they will be reflected in differences by running sub group regressions
3)      Although a versatile tool, it needs to be handles carefully- (i) if regression contains a constant term, no. of dummy variables must be 1 less than the no. of classifications. (ii) the coefficient  attached to the dummy variable must always be interpreted in relation to the base or reference group-i.e. the group that receives the value of 0. (iii) if a model has several qualitative variables with several classes, introducing dummy variables can consume large no. of degrees of freedom. Therefore, they need to be carefully chosen.

No comments:

Post a Comment