Sunday, January 14, 2024

Econometrics : Introduction to Linear Regression (Rigorous & Detailed)

What is Econometrics?


Econometrics is the application of statistics in Economics but these days it is also applied in
other social sciences like history and political science. This is the golden equation of econometrics:



Where Y is something we want to predict and X is the predictor. Y is the dependent variable and
X is an independent variable. Y is dependent on X. a and b are the parameters that we need to
estimate. e is an error term. We assume that the model is linear in nature ( i.e Y = a + bX). It may
happen that the model is not perfectly linear or Y is approximately but not exactly equal to 'a+bX'.
Whereas, e is a random error term.

This is a simplistic version of the explanation for this magical equation of econometrics.(Magical because it reappears everywhere)
You should remember this golden equation as its importance can’t emphasised enough.

Now, we know that Y is the variable of interest, which is dependent on X given parameters 'a'
and 'b'. After the estimation of parameters, the equation is written as :



This above equation shows the approximate relation where a and b are estimated values.
The symbol (^) symbolizes the estimation. So if you see ‘^’ anywhere in econometrics, now you
will know it’s an estimated value, not the true value.

Example: Do you think Ph.D. graduates get a better salary than Non-PhD graduates?
Let us suppose we have some data on the salary of some PhD. graduates(n1) and Non-PhD graduates(n2).



So, can we construct some kind of regression problem or an econometric problem out of this?
Yes, we do have a solution for this. To solve the problem, let the PhD salary be denoted by 1
and the non-PhD salary be denoted by 0.
This is done to find a relationship between Y and x,
where Y will contain the salary of graduates (PhD and Non-PhD) and x will contain the values (1
and 0). Now, we can find the relation using the golden equation of econometrics



We can estimate the average salary of PhD graduates (Ȳ1) as well as Non PhD graduates(Ȳ2).




Let us understand an operator which is denoted by E(.)
This is the Expectation operator which loosely speaking gives the average value of the variable.
Now, can you think about the expectation of the random error term, e in equation (1)? Ideally,
the expectation of the error term should be zero.

E(e) = 0

We can calculate the Expected value of the graduates using the equations above. Try it yourself once.

E(Y | X=1) = â+ b̂                                                                                             (PhD graduates)
E(Y | X=0) = â                                                                                             (Non- PhD graduates)

Note: The value of parameters, a and b can’t be calculated. It can only be estimated. Estimated parameters can be depicted by â and b̂, which are used above.

Expected salary should be equal to the average of their salaries i.e




i.e Ȳ1 = â+ b̂
     Ȳ2 = â

Now, we have 2 linear equations with 2 variables. This system of linear equations can be solved
easily.

â= Ȳ2
b̂ = Ȳ1 - Ȳ2

Substitute the values of â and b̂, we get:

ŷ= Ȳ2 + (Ȳ1 - Ȳ2)X

This equation doesn’t contain a random error term because it’s an estimated value of the parameters, a and b. Estimated value implies expected value and the expected value of the error term is 0.

To understand better, relate to the example given above. The salary of PhD graduates has been labelled 1, X=1. 


You might not appreciate this econometric framework of estimating values of Y in the beginning.
This method has strong roots in econometrics and will be applied to further models.