Sunday, January 14, 2024

Econometrics : Introduction to Linear Regression (Rigorous & Detailed)

What is Econometrics?


Econometrics is the application of statistics in Economics but these days it is also applied in
other social sciences like history and political science. This is the golden equation of econometrics:



Where Y is something we want to predict and X is the predictor. Y is the dependent variable and
X is an independent variable. Y is dependent on X. a and b are the parameters that we need to
estimate. e is an error term. We assume that the model is linear in nature ( i.e Y = a + bX). It may
happen that the model is not perfectly linear or Y is approximately but not exactly equal to 'a+bX'.
Whereas, e is a random error term.

This is a simplistic version of the explanation for this magical equation of econometrics.(Magical because it reappears everywhere)
You should remember this golden equation as its importance can’t emphasised enough.

Now, we know that Y is the variable of interest, which is dependent on X given parameters 'a'
and 'b'. After the estimation of parameters, the equation is written as :



This above equation shows the approximate relation where a and b are estimated values.
The symbol (^) symbolizes the estimation. So if you see ‘^’ anywhere in econometrics, now you
will know it’s an estimated value, not the true value.

Example: Do you think Ph.D. graduates get a better salary than Non-PhD graduates?
Let us suppose we have some data on the salary of some PhD. graduates(n1) and Non-PhD graduates(n2).



So, can we construct some kind of regression problem or an econometric problem out of this?
Yes, we do have a solution for this. To solve the problem, let the PhD salary be denoted by 1
and the non-PhD salary be denoted by 0.
This is done to find a relationship between Y and x,
where Y will contain the salary of graduates (PhD and Non-PhD) and x will contain the values (1
and 0). Now, we can find the relation using the golden equation of econometrics



We can estimate the average salary of PhD graduates (Ȳ1) as well as Non PhD graduates(Ȳ2).




Let us understand an operator which is denoted by E(.)
This is the Expectation operator which loosely speaking gives the average value of the variable.
Now, can you think about the expectation of the random error term, e in equation (1)? Ideally,
the expectation of the error term should be zero.

E(e) = 0

We can calculate the Expected value of the graduates using the equations above. Try it yourself once.

E(Y | X=1) = â+ b̂                                                                                             (PhD graduates)
E(Y | X=0) = â                                                                                             (Non- PhD graduates)

Note: The value of parameters, a and b can’t be calculated. It can only be estimated. Estimated parameters can be depicted by â and b̂, which are used above.

Expected salary should be equal to the average of their salaries i.e




i.e Ȳ1 = â+ b̂
     Ȳ2 = â

Now, we have 2 linear equations with 2 variables. This system of linear equations can be solved
easily.

â= Ȳ2
b̂ = Ȳ1 - Ȳ2

Substitute the values of â and b̂, we get:

ŷ= Ȳ2 + (Ȳ1 - Ȳ2)X

This equation doesn’t contain a random error term because it’s an estimated value of the parameters, a and b. Estimated value implies expected value and the expected value of the error term is 0.

To understand better, relate to the example given above. The salary of PhD graduates has been labelled 1, X=1. 


You might not appreciate this econometric framework of estimating values of Y in the beginning.
This method has strong roots in econometrics and will be applied to further models.







Wednesday, November 1, 2023

Transitioning to Data Science

Ask yourself questions:
Before you consider switching to data science, you should first ask whether you actually need to do so. Whether you have the necessary skillsets and aptitude to do so. This is because data science is purely applied math. It’s not really something cool, technical, or extremely high-tech. And if you don't love mathematics, you will hate it. Data science is not a kind of complex computer programming. Computer programming is used as a tool in data science. At its core, data science is essentially mathematics. And if you don't enjoy math, you won't like data science. 


If you don't enjoy programming, attempt to learn the basic fundamentals. You could try to delegate your programming, but at the very least, you need to have some appreciation for data, and that can only happen when you have some understanding of mathematics. If you think that trigonometry and calculus don’t have any practical applications, you probably aren't meant for data science. 

However, if you have advanced to the point where you feel that you have the necessary clarity of thought and are certain that you want to switch to data science. Then, there are two ways to go about doing that. First is the traditional method, and the unconventional method


Traditional Method of Transitioning to Data Science:
The traditional route is to en-roll in a data science programme. Do make sure that it must be a full-time programme offered by a reputed university. In Indian Context, the Indian Statistical Institute(ISI), the IISc Bangalore, the IITs, are great places to look for. This course must be a full-time course not a course that you would do over the weekend. You must dedicate one or two years to that programme, putting everything else on hold.


Unconventional Method of Transitioning to Data Science:
The unconventional approach is to complete some sort of data science project and build a portfolio. Then, using that to transition by getting recommendations from friends, understanding the specifics of a given data science job posting, and working to improve on the same. If you start applying to 10–20 data science positions, and you prepare for every position in a very specific way over the course of, say, five-six months or a year, you will know what you need to do, to be a successful Data Scientist.

Now the question arises: How do you build a portfolio, and how do you create data science projects from scratch? To do that, you have to identify problems in your life that you care deeply about and turn those into data science projects. Either solve those problems or at the very least find an approach and try to consider what might be a potential solution. If you believe that, while you have identified a data science problem, you are stuck somewhere or that you do not actually have a data science problem to solve, I've made a number of videos explaining how to start data science projects and what sorts of projects fall under the data science umbrella. You may watch those videos. 

You can get in touch with me through email, a message on LinkedIn, Instagram, or through visiting my verified Topmate profile: https://topmate.io/ashish_gourav

Friday, October 6, 2023

Use ChatGPT to become a 10x Professional

ChatGPT is an advanced language model developed by OpenAI that has gained widespread recognition and adoption across various industries. It utilizes deep learning techniques to generate human-like responses and engage in interactive conversations with users. The capabilities of ChatGPT extend far beyond simple question-answering, making it an invaluable tool for businesses and individuals alike. 

Let us look into the most efficient usage of the superpowers of ChatGPT in detail.

Learn the basics of any new topic
Suppose that you want to learn about NLP, which is a completely alien topic for you. You can simply enter it in ChatGPT and you will get a detailed article on NLP.

Search faster
You are just a click away to search a topic. In ChatGPT, you can get the results within a fraction of seconds.

Code Faster
You can even code faster using ChatGPT. You have to just write the instructions for the code you require. ChatGPT will do it for you for your choice of language.

Pop cultural references
Along with all these benefits, you can also get pop cultural references in ChatGPT. For example, type “Movies for Gen Z”, you will get a list of the most popular movies that appeal to the Gen Z demographic.

Learn specific things of a difficult topic
You can also learn any particular detail of a difficult topic using ChatGPT.


These applications represent just a glimpse into the diverse possibilities offered by ChatGPT's superpowers. As technology continues to advance, the potential for leveraging ChatGPT in innovative and impactful ways will only expand, benefiting individuals, businesses, and society as a whole.

None of these things are replacements of existing technology or human beings. But these all are applications which human beings will use, which technologies will use. In this way, ChatGPT will enable both technology and human beings to perform efficiently.

ChatGPT or any other technology like ChatGPT will not be something which will be impediment or disruptive to your career progress. It will be helping you becoming a smarter person. It will help you becoming a smarter professional and a more efficient individual. And, I have maintained this stance from the last 10-15 years of my adulthood days that technology is always an enabler and stimulant, it can never replace human beings, it can never replace old technologies. There are people who use low-tech or old technologies because it is easy for them to use. But yes, given time everything dies because everything comes with an expiry date.


Artificial Intelligence & Machine Learning's Most Important Concepts for Interviews

Let’s learn the most important concepts of Artificial Intelligence and Machine Learning through alphabets!

ANN
Artificial Neural Network (ANN) mimics the human brain’s neural network. Hidden units of the hidden layer in the artificial neural network architecture can be thought of as Neurons and they introduce non-linearity with inputs and few biases.


Bagging

Bagging means making multiple training sets from a training set and training separate models on those separate training sets and finally averaging them out.


Correlation
Correlation measures the relationship between two features or variables.


Deep Learning
The cornerstone of deep learning is artificial neural networks. Whenever you have artificial neural network with many hidden layers, its most likely a deep learning problem which is being solved.


Error
In a supervised machine learning setup, you always have an actual number on which you are supervising the model and a predicted and an estimated number and the difference between these two numbers is error. You tend to minimise the error in these kinds of machine learning problems which are a kind of supervised machine learning problems.


Feature
Feature or variable or input, all it means is what you feed into your data science models.

 

Gradient Descent
Gradient descent is a way to approximate global minima using local minima following an Iterative approach.


Hypothesis Testing
Using test statistic, you either reject the null hypothesis in the favour of alternative or fail to reject the null.


Intercept
Intercept in linear regression is that parameter which has the same impact for all inputs. For example, if your intercept is 3, the impact of intercept on y will always be 3, irrespective of x.


Julia
Julia is the next breakthrough in data science programming, as it is specially designed for quantitative researchers or data scientists.


KNN
K-Nearest Neighbours or KNN is the method which can be adopted to both regression and classification. The key idea here is that here you find K nearest points of any data point, and using this create a new model. And most importantly, this is a non-parametric model, here K is the hyper-parameter.


Linear Regression
Linear Regression is the simplest, yet the most powerful predictive ML Model. Its components are the intercept and coefficients of all the features or predictors. 


Model
Model is an abstraction or simplification of reality using mathematics and statistics.


Normal Distribution
People generally talk about normal distribution when they are new to data science. It is not the only statistical distribution of importance. There are others as well. To completely identify a normal distribution, you need its mean and variance. And the heuristics to understand whether a distribution is normal or not, is that it will be symmetric about its mean with bell shape. 


Overfitting
Overfitting is prone to happen when the model is too complex, fits the training data well but fails miserably when presented with new data.


P- Value
P-value measures the strength of rejection or failure to rejection. In statistical terms, it is the observed level of significance.


Q-Q Plot
Q-Q Plot compares two probability distributions. It is a Test for Normality.


Random Forest
Random forest combines the number of decision trees similar to bagging with a twist on the choice of features while making the decision trees.


SVD
SVD along with PCA is a dimension reduction technique and if you feed into it p features, when p is very large, it gives you a smaller set of modified features. In PCA terms, most important principal components which helps you in reducing the dimensionality of the problem.


T Test
T-test in statistical terms is a statistical test to compare means of two different distributions.


Underfitting
Underfitting is the opposite of overfitting. It is a deficient model which has missed out on important patterns or features while training a model.


Variance
Variance measures how spread the data is about the mean. So, if a data set is 2, 2, 2, 2…., it will definitely have a very low variance, which in this case is zero. And, if let’s say, data is 2, 5, 10, 15, 30, definitely its variance is going to be very high and not zero.


Web Scraping
Web Scraping is a risky way to get data from websites. Hence, read all the possible violations while doing web scraping. 


X (Inputs)
X is what you now.


Y (Outputs)
Y is what you need to know.


Z-Score
Z-Score measures how far data point is from the mean. In terms of standard deviation, just subtract the mean from the data point and divide it by the standard deviation.


Thursday, August 10, 2023

IPO Investing Strategy - High Return & Low Risk

In this post, you will get to know, how one can gain in IPOs, without investing in IPOs. The strategy I am going to describe will be very helpful to you and it will take only few minutes of research. It is a very high probability and a low-risk strategy. 

I invest my money only when I have the utmost clarity of low risk and high reward. My way of investment in stock markets is purely academic in nature. So, my fund size and risk tolerance remain small. The moment the money at risk is small and the risk is also small, you ultimately cannot become rich. You might have an outstanding return, but you will not become a millionaire overnight. You will be in a slow and steady path towards learning how stock market investing works. 

What is my strategy?



As I described, my strategy does not take much research, but that comes with the experience which I have about stock market investing. Essentially, whenever there is an IPO buzz happening around, I do not subscribe to the IPO. Instead, I wait for the listing day. Just before the listing happens, I check three things:

Grey Market Premium:
Grey Market Premium is the first common thing which is checked by a lot of people. For the sake of clarity, let me tell you about grey market premium. It is the premium the stock is trading at, in the grey market. It is not a formal market, and as the name suggests, it is not very mainstream in nature. So, it’s like a Mock Test for IPO. If you have a positive grey market premium, there is a very high chance that on the listing day, you will have a positive return on that stock IPO. 


Nature of the Business:
The second signal is the nature of business. If you feel that the business prospect is good, then you have a positive signal number two. Now, how would you find that out, that is your skill, that is an art not a science. 


Subscription Status:
The third signal is the subscription status, was it 2, 3 or 5 times subscribed. You can have a very high subscription, but might not get positive returns on the listing day and you might have a very tepid kind of subscription, but you still might get positive returns on the listing day. So, this is a very fuzzy area. You have to have an intuition about these things.

Once you have the above three positive signals or rather let’s just take two positive signals out of the three, you move on to the second last stage.

What is the second last stage?
You go on to a trading platform, just few minutes before you put your money into it, check the trade lots, how many people are selling, how many people are buying. I use a full-service trading account, so I get this information pretty easily and I don’t know about the discount brokers, if they have such facilities, where they give you what is the trade lot, how many people are buying, how many people are selling.

After I gain confidence that yes! more people are buying than selling, my task is almost dome. I put my money and wait for the stock price to go up. Ideal scenario is that I do not just wait. I set up an automated order on sell, so whenever the price crosses a certain percentage, like 12-15% (in my case) on the listing day, I exit out of the market.
So, this puts an upper cap on my investment returns, but I am willing to do that, because in just a matter of few minutes or sometimes few hours I have around 10-12 percent returns, which is fair enough for me as I am not putting that in-depth research into that investment. It is just a 5-10 minutes of work for me and I am sitting on more than 10-12 percent of investment returns.
The moment I increase my fund size and increase a bit of my risk tolerance, then I will have substantial gains at a higher risk. That might not work for me, because the risk which you take has consequences and it’s not a lottery or a gamble. It’s just how numbers & probability work.

Saturday, July 1, 2023

How to Make Money with Stock Market Investing

In this blog, you will come to know about tips and tricks about investing in stock market. You have to assume that in the worst case, you may lose out all the amount of money or the corpus of money which you are putting in the stock market. So, if you are ready to take that kind of risk, then invariably, you will land into few opportunities which will give you disproportionate gains.

So, following are my simple rules for investing in stock market:

  1. Try to invest in two to three companies every six months and do not take out your money after suffering initial losses. Try to keep invested in the stock market for good amount of time.


  1. Set your targets before investing. Let us say if you are investing in Company A, so your target set is that you want to gain 30 percent from this company in one month. Suppose that within only two days, the company rises by 30 percent, but you should not be greedy and just take out that money. Your target was to achieve either minus 5 percent of loss in this company or thirty percent in two years. So, In the year time frame, whenever the company goes below minus five percent, you have to sell or whenever the company goes above 30 percent, you have to sell.

Therefore, try to keep a lower and upper cap for your investment criteria for every stock. Once you do that, you will always be a rule-based investor or trader and then it would not be difficult for you to overcome your behavioural tendencies. 

Stock market investing is easy, but it is the people who invest make it difficult and it is the behavioural aspects of a person which makes investing quite difficult in nature. The idea here is that a part of it is an art, it is not a very hard science. So, you need to learn, how to master this art of investing. 


  1. To master the exit point is decided by your behavioural tendencies. So, I would suggest you to read a bit of economics, behavioural finance, psychology and investment behaviours and then understand how to get over this.


  1. Whenever you invest in a company, try to first look out for P/E ratios. When you look for P/E ratios, what happens is that whichever company has a higher P/E ratio than their peers, might be a high growth company, or there is some kind of overpricing in the company, or there is some kind of manipulations in the company’s share. So, try to avoid such companies which have more P/E ratio than their peers.


  1. Dividend Yield

Try to look for companies which give predictable dividends. If a company is giving you a predictable dividend, you can earn money without even exiting the market. So, keep invested in a company which is giving you a consistent dividend. There is also a downside to those dividend companies, but still, the downside is not that great because if you are getting money on a regular basis, you also get appreciation of stock market and also get the dividend income.


  1. Current Debt

If their current debt which can be found in the balance sheet of the company, is of the same order as their current cash, then that company has a reasonable health. 


  1. EBITDA If EBITDA and profits are very different from each other like half of each other, then you need to see that why the EBITDA is so different than the net profits.


Hence, there are many rules for investing in stock market. Once you start searching how to invest, then you can go down the rabbit hole of investing. But the key idea here is that you need to learn continuously or else you will keep finding investing as gambling.


Tuesday, June 27, 2023

Can Economists become Data Scientists

It is a well-known fact that the key role of an economist is optimisation of costs for the maximisation of payoffs. Data scientists also perform a relatively similar job. They too optimise the cost functions to maximise the model fit. They both are highly interlinked job profiles. The tools that economists and data scientists use are the same. Perhaps, economists are the best suited to become data scientists, not engineers. 

Apart from the overt generalisations which have been mentioned above that economists optimise the payoffs and minimise the costs, it might not be true always.  But that is the general theme of economics because economics is based on resource constraints. Resources are scarce, so the use of available resources needs to be optimised. 

The same happens with data scientists. Data scientists don’t have a lot of data or their clients or their problems don’t have really infinite resources. Some kind of modulations is needed be given to those resources, so that they are getting optimised appropriately. There are a plenty of data science problems in which resources are to be optimised and payoffs are to be maximised.

Generally, economists used data science as one of its tools and one of its techniques. And, data scientists are all technique. So, if you are a data scientist, without a domain expertise, you are a kind of technician. But suppose that you have a domain expertise, maybe you are a healthcare data scientist, so you are high in demand having a specific skill set. So, an economist and a domain specific data scientists have a lot in common.

Tools You must know that in economics, there is something called econometrics which essentially comprises of a lot of regression. Also, if you begin any data science course or any machine learning course, they will start teaching you regression, simple and multiple correlation, and what happens when your data has some kind of deviation from the assumptions of regression. These are the specific things which are discussed in detail in econometrics because econometrics is perhaps the most powerful tool of economics. Economics uses statistics, math, econometrics and a lot of other tools, which are analytical in nature and so does the data science industry. Therefore, economists are well suited to become a data scientist.

Are Software Engineers better Data Scientists or Product Managers or MBAs? Software engineers have an extremely difficult task at hand and they are quite good at it. But just because data science has a bit of coding in its ambit, you cannot say that software engineers are better suited to become data scientists. But yes, software engineers come in a lot of shapes and forms, so may be there is a specific kind of software engineer who really is specializing in the artificial intelligence and machine learning domain. So they are quite invaluable data scientists. But you cannot say that all software engineers are meant to be data scientists. There is a subset of software engineers which is kind of connected to data science. But economics loosely is all about analytical tools and using models to describe the ideas, the theories and the philosophies, that ultimately run the trade of goods and services. Therefore, it can be said that an economist will be a traditional data scientist and an AI/ML software engineer will be a kind of maverick and that person will really bring in cutting edge innovation in data science and both of them are needed.

So, if you still have a question that on studying economics can make you a data analyst or a data scientist. The answer is yes, but the only point over here is that if you are beginning your economics education, don’t think it through like this will be a stepping stone to a data science career. If you want to study data science, these days there are specialised courses in data science. Now a days, there are a lot of practitioners in data science not really trained in data science. They have just become one. But in future, data scientists are going to be trained. So, if you really believe that this is what you want to do in your life, you can find a plenty of resources for that. But if you believe that you want to become an economist, then do so and study economics because economics also is a pretty interesting field of study and it has a lot of value. The predictions which economists make is just one part of the value pyramid and it has a very high degree of explanatory power and interpretation which an artificial neutral networks or the ML models don’t have. They are a kind of black box models. This is the main reason why economists bring a lot of value to data science.

So don’t be scared of switching from economics to data science, because they are really close siblings.