The Statistical Dharma

Wednesday, September 18, 2024

Data Science Insights of a Youtube Channel Manager

When it comes to growing a YouTube channel, understanding and leveraging YouTube analytics is crucial. Here, we'll delve into some key aspects of YouTube analytics.

The Importance of Click Through Rate (CTR):
One of the most critical metrics to monitor is the Click Through Rate (CTR) of your thumbnails. Simply put, CTR measures how many people clicked on your video after seeing it. For example, if 100 people see your video thumbnail and 10 of them click on it, your CTR is 10%. This metric is vital because if people aren't clicking on your videos, they aren't watching them, rendering all other efforts moot. A good CTR can vary: smaller channels might see CTRs as high as 10-14%, while larger channels might aim for 5-7%. It's important not to set unrealistic goals like achieving a 40% CTR, as this is typically unsustainable.

Average View Duration and Retention Graph:
The next important analytical tools are average view duration and the retention graph. These metrics help you understand for how long viewers are watching your videos and at what points they tend to drop off. Having access to raw data from YouTube analytics allows you to make informed decisions about your content. For channel managers, having manager access means they can optimise titles, descriptions, and tags to improve these metrics.

Timing and Audience Activity:
Another critical factor is knowing when your audience is most active. Posting videos when your audience is online increases the chances of getting clicks and views. This timing can significantly impact your video's performance and subsequent recognition on the platform.

Long-term Potential: Search-Based vs. Browse-Based Content:
When thinking about the long-term potential of your videos, it's essential to differentiate between search-based and browse-based content. Search-based content is evergreen; it remains relevant over time and is often sought out by viewers. For example, educational videos on specific topics like microeconomics are searched for by people looking to learn about the subject. On the other hand, browse-based content, such as the videos made by popular YouTuber MrBeast, is designed to appear in users' feeds. This content relies on eye-catching thumbnails and compelling titles to attract viewers who aren't necessarily searching for it but are intrigued enough to click when it appears on their screens.

Strategies for Search-Based Content:
To create successful search-based content, focus on topics that people are likely to search for over a long period. This type of content benefits from being educational or informative, as people continuously seek out knowledge. By targeting specific, evergreen topics, you increase the likelihood of your videos being found through searches on both YouTube and Google.

Strategies for Browse-Based Content:
If you aim to target browse-based audience, your content should appeal to a broad audience with generic but engaging topics. Effective thumbnails and titles are crucial here, as they serve as the primary hook to attract viewers. The goal is to capture the viewer's attention quickly and entice them to click on your video.

Current Trends in Channel Management:
Currently, many channels, especially podcast and entertainment channels, rely heavily on browse-based content. However, educational channels typically follow a search-based approach, as their content is designed to be found by users seeking specific information. Balancing these two types of content can help diversify a channel's reach and maintain steady growth.

Understanding and utilising YouTube analytics is essential for channel growth. By focusing on key metrics like CTR, average view duration, and retention graphs, and by strategically planning content to be either search-based or browse-based, creators can optimise their channels for success.

Sunday, January 14, 2024

Econometrics : Introduction to Linear Regression (Rigorous & Detailed)

What is Econometrics?

Econometrics is the application of statistics in Economics but these days it is also applied in
other social sciences like history and political science. This is the golden equation of econometrics:

Where Y is something we want to predict and X is the predictor. Y is the dependent variable and

X is an independent variable. Y is dependent on X. a and b are the parameters that we need to

estimate. e is an error term. We assume that the model is linear in nature ( i.e Y = a + bX). It may

happen that the model is not perfectly linear or Y is approximately but not exactly equal to 'a+bX'.

Whereas, e is a random error term.

This is a simplistic version of the explanation for this magical equation of econometrics.(Magical because it reappears everywhere)
You should remember this golden equation as its importance can’t emphasised enough.

Now, we know that Y is the variable of interest, which is dependent on X given parameters 'a'

and 'b'. After the estimation of parameters, the equation is written as :

This above equation shows the approximate relation where a and b are estimated values.

The symbol (^) symbolizes the estimation. So if you see ‘^’ anywhere in econometrics, now you

will know it’s an estimated value, not the true value.

Example: Do you think Ph.D. graduates get a better salary than Non-PhD graduates?

Let us suppose we have some data on the salary of some PhD. graduates(n1) and Non-PhD graduates(n2).

So, can we construct some kind of regression problem or an econometric problem out of this?

Yes, we do have a solution for this. To solve the problem, let the PhD salary be denoted by 1

and the non-PhD salary be denoted by 0.
This is done to find a relationship between Y and x,

where Y will contain the salary of graduates (PhD and Non-PhD) and x will contain the values (1

and 0). Now, we can find the relation using the golden equation of econometrics

We can estimate the average salary of PhD graduates (Ȳ1) as well as Non PhD graduates(Ȳ2).

Let us understand an operator which is denoted by E(.)

This is the Expectation operator which loosely speaking gives the average value of the variable.

Now, can you think about the expectation of the random error term, e in equation (1)? Ideally,

the expectation of the error term should be zero.

E(e) = 0

We can calculate the Expected value of the graduates using the equations above. Try it yourself once.

E(Y | X=1) = â+ b̂ (PhD graduates)

E(Y | X=0) = â (Non- PhD graduates)

Note: The value of parameters, a and b can’t be calculated. It can only be estimated. Estimated parameters can be depicted by â and b̂, which are used above.

Expected salary should be equal to the average of their salaries i.e

i.e Ȳ1 = â+ b̂

Ȳ2 = â

Now, we have 2 linear equations with 2 variables. This system of linear equations can be solved

easily.

â= Ȳ2
b̂ = Ȳ1 - Ȳ2

Substitute the values of â and b̂, we get:

ŷ= Ȳ2 + (Ȳ1 - Ȳ2)X

This equation doesn’t contain a random error term because it’s an estimated value of the parameters, a and b. Estimated value implies expected value and the expected value of the error term is 0.

To understand better, relate to the example given above. The salary of PhD graduates has been labelled 1, X=1.

You might not appreciate this econometric framework of estimating values of Y in the beginning.
This method has strong roots in econometrics and will be applied to further models.

Wednesday, November 1, 2023

Transitioning to Data Science

Ask yourself questions:
Before you consider switching to data science, you should first ask whether you actually need to do so. Whether you have the necessary skillsets and aptitude to do so. This is because data science is purely applied math. It’s not really something cool, technical, or extremely high-tech. And if you don't love mathematics, you will hate it. Data science is not a kind of complex computer programming. Computer programming is used as a tool in data science. At its core, data science is essentially mathematics. And if you don't enjoy math, you won't like data science.

If you don't enjoy programming, attempt to learn the basic fundamentals. You could try to delegate your programming, but at the very least, you need to have some appreciation for data, and that can only happen when you have some understanding of mathematics. If you think that trigonometry and calculus don’t have any practical applications, you probably aren't meant for data science.

However, if you have advanced to the point where you feel that you have the necessary clarity of thought and are certain that you want to switch to data science. Then, there are two ways to go about doing that. First is the traditional method, and the unconventional method

Traditional Method of Transitioning to Data Science:
The traditional route is to en-roll in a data science programme. Do make sure that it must be a full-time programme offered by a reputed university. In Indian Context, the Indian Statistical Institute(ISI), the IISc Bangalore, the IITs, are great places to look for. This course must be a full-time course not a course that you would do over the weekend. You must dedicate one or two years to that programme, putting everything else on hold.

Unconventional Method of Transitioning to Data Science:
The unconventional approach is to complete some sort of data science project and build a portfolio. Then, using that to transition by getting recommendations from friends, understanding the specifics of a given data science job posting, and working to improve on the same. If you start applying to 10–20 data science positions, and you prepare for every position in a very specific way over the course of, say, five-six months or a year, you will know what you need to do, to be a successful Data Scientist.

Now the question arises: How do you build a portfolio, and how do you create data science projects from scratch? To do that, you have to identify problems in your life that you care deeply about and turn those into data science projects. Either solve those problems or at the very least find an approach and try to consider what might be a potential solution. If you believe that, while you have identified a data science problem, you are stuck somewhere or that you do not actually have a data science problem to solve, I've made a number of videos explaining how to start data science projects and what sorts of projects fall under the data science umbrella. You may watch those videos.

You can get in touch with me through email, a message on LinkedIn, Instagram, or through visiting my verified Topmate profile: https://topmate.io/ashish_gourav.

Friday, October 6, 2023

Use ChatGPT to become a 10x Professional

ChatGPT is an advanced language model developed by OpenAI that has gained widespread recognition and adoption across various industries. It utilizes deep learning techniques to generate human-like responses and engage in interactive conversations with users. The capabilities of ChatGPT extend far beyond simple question-answering, making it an invaluable tool for businesses and individuals alike.

Let us look into the most efficient usage of the superpowers of ChatGPT in detail.

Learn the basics of any new topic
Suppose that you want to learn about NLP, which is a completely alien topic for you. You can simply enter it in ChatGPT and you will get a detailed article on NLP.

Search faster
You are just a click away to search a topic. In ChatGPT, you can get the results within a fraction of seconds.

Code Faster
You can even code faster using ChatGPT. You have to just write the instructions for the code you require. ChatGPT will do it for you for your choice of language.

Pop cultural references
Along with all these benefits, you can also get pop cultural references in ChatGPT. For example, type “Movies for Gen Z”, you will get a list of the most popular movies that appeal to the Gen Z demographic.

Learn specific things of a difficult topic
You can also learn any particular detail of a difficult topic using ChatGPT.

These applications represent just a glimpse into the diverse possibilities offered by ChatGPT's superpowers. As technology continues to advance, the potential for leveraging ChatGPT in innovative and impactful ways will only expand, benefiting individuals, businesses, and society as a whole.

None of these things are replacements of existing technology or human beings. But these all are applications which human beings will use, which technologies will use. In this way, ChatGPT will enable both technology and human beings to perform efficiently.

ChatGPT or any other technology like ChatGPT will not be something which will be impediment or disruptive to your career progress. It will be helping you becoming a smarter person. It will help you becoming a smarter professional and a more efficient individual. And, I have maintained this stance from the last 10-15 years of my adulthood days that technology is always an enabler and stimulant, it can never replace human beings, it can never replace old technologies. There are people who use low-tech or old technologies because it is easy for them to use. But yes, given time everything dies because everything comes with an expiry date.

Artificial Intelligence & Machine Learning's Most Important Concepts for Interviews

Let’s learn the most important concepts of Artificial Intelligence and Machine Learning through alphabets!

ANN
Artificial Neural Network (ANN) mimics the human brain’s neural network. Hidden units of the hidden layer in the artificial neural network architecture can be thought of as Neurons and they introduce non-linearity with inputs and few biases.

Bagging

Bagging means making multiple training sets from a training set and training separate models on those separate training sets and finally averaging them out.

Correlation
Correlation measures the relationship between two features or variables.

Deep Learning
The cornerstone of deep learning is artificial neural networks. Whenever you have artificial neural network with many hidden layers, its most likely a deep learning problem which is being solved.

Error
In a supervised machine learning setup, you always have an actual number on which you are supervising the model and a predicted and an estimated number and the difference between these two numbers is error. You tend to minimise the error in these kinds of machine learning problems which are a kind of supervised machine learning problems.

Feature
Feature or variable or input, all it means is what you feed into your data science models.

Gradient Descent
Gradient descent is a way to approximate global minima using local minima following an Iterative approach.

Hypothesis Testing
Using test statistic, you either reject the null hypothesis in the favour of alternative or fail to reject the null.

Intercept
Intercept in linear regression is that parameter which has the same impact for all inputs. For example, if your intercept is 3, the impact of intercept on y will always be 3, irrespective of x.

Julia
Julia is the next breakthrough in data science programming, as it is specially designed for quantitative researchers or data scientists.

KNN
K-Nearest Neighbours or KNN is the method which can be adopted to both regression and classification. The key idea here is that here you find K nearest points of any data point, and using this create a new model. And most importantly, this is a non-parametric model, here K is the hyper-parameter.

Linear Regression
Linear Regression is the simplest, yet the most powerful predictive ML Model. Its components are the intercept and coefficients of all the features or predictors.

Model
Model is an abstraction or simplification of reality using mathematics and statistics.

Normal Distribution
People generally talk about normal distribution when they are new to data science. It is not the only statistical distribution of importance. There are others as well. To completely identify a normal distribution, you need its mean and variance. And the heuristics to understand whether a distribution is normal or not, is that it will be symmetric about its mean with bell shape.

Overfitting
Overfitting is prone to happen when the model is too complex, fits the training data well but fails miserably when presented with new data.

P- Value
P-value measures the strength of rejection or failure to rejection. In statistical terms, it is the observed level of significance.

Q-Q Plot
Q-Q Plot compares two probability distributions. It is a Test for Normality.

Random Forest
Random forest combines the number of decision trees similar to bagging with a twist on the choice of features while making the decision trees.

SVD
SVD along with PCA is a dimension reduction technique and if you feed into it p features, when p is very large, it gives you a smaller set of modified features. In PCA terms, most important principal components which helps you in reducing the dimensionality of the problem.

T Test
T-test in statistical terms is a statistical test to compare means of two different distributions.

Underfitting
Underfitting is the opposite of overfitting. It is a deficient model which has missed out on important patterns or features while training a model.

Variance
Variance measures how spread the data is about the mean. So, if a data set is 2, 2, 2, 2…., it will definitely have a very low variance, which in this case is zero. And, if let’s say, data is 2, 5, 10, 15, 30, definitely its variance is going to be very high and not zero.

Web Scraping
Web Scraping is a risky way to get data from websites. Hence, read all the possible violations while doing web scraping.

X (Inputs)
X is what you now.

Y (Outputs)
Y is what you need to know.

Z-Score
Z-Score measures how far data point is from the mean. In terms of standard deviation, just subtract the mean from the data point and divide it by the standard deviation.

Thursday, August 10, 2023

IPO Investing Strategy - High Return & Low Risk

In this post, you will get to know, how one can gain in IPOs, without investing in IPOs. The strategy I am going to describe will be very helpful to you and it will take only few minutes of research. It is a very high probability and a low-risk strategy.

I invest my money only when I have the utmost clarity of low risk and high reward. My way of investment in stock markets is purely academic in nature. So, my fund size and risk tolerance remain small. The moment the money at risk is small and the risk is also small, you ultimately cannot become rich. You might have an outstanding return, but you will not become a millionaire overnight. You will be in a slow and steady path towards learning how stock market investing works.

What is my strategy?

As I described, my strategy does not take much research, but that comes with the experience which I have about stock market investing. Essentially, whenever there is an IPO buzz happening around, I do not subscribe to the IPO. Instead, I wait for the listing day. Just before the listing happens, I check three things:

Grey Market Premium:
Grey Market Premium is the first common thing which is checked by a lot of people. For the sake of clarity, let me tell you about grey market premium. It is the premium the stock is trading at, in the grey market. It is not a formal market, and as the name suggests, it is not very mainstream in nature. So, it’s like a Mock Test for IPO. If you have a positive grey market premium, there is a very high chance that on the listing day, you will have a positive return on that stock IPO.

Nature of the Business:
The second signal is the nature of business. If you feel that the business prospect is good, then you have a positive signal number two. Now, how would you find that out, that is your skill, that is an art not a science.

Subscription Status:
The third signal is the subscription status, was it 2, 3 or 5 times subscribed. You can have a very high subscription, but might not get positive returns on the listing day and you might have a very tepid kind of subscription, but you still might get positive returns on the listing day. So, this is a very fuzzy area. You have to have an intuition about these things.

Once you have the above three positive signals or rather let’s just take two positive signals out of the three, you move on to the second last stage.

What is the second last stage?
You go on to a trading platform, just few minutes before you put your money into it, check the trade lots, how many people are selling, how many people are buying. I use a full-service trading account, so I get this information pretty easily and I don’t know about the discount brokers, if they have such facilities, where they give you what is the trade lot, how many people are buying, how many people are selling.

After I gain confidence that yes! more people are buying than selling, my task is almost dome. I put my money and wait for the stock price to go up. Ideal scenario is that I do not just wait. I set up an automated order on sell, so whenever the price crosses a certain percentage, like 12-15% (in my case) on the listing day, I exit out of the market.
So, this puts an upper cap on my investment returns, but I am willing to do that, because in just a matter of few minutes or sometimes few hours I have around 10-12 percent returns, which is fair enough for me as I am not putting that in-depth research into that investment. It is just a 5-10 minutes of work for me and I am sitting on more than 10-12 percent of investment returns.
The moment I increase my fund size and increase a bit of my risk tolerance, then I will have substantial gains at a higher risk. That might not work for me, because the risk which you take has consequences and it’s not a lottery or a gamble. It’s just how numbers & probability work.

Saturday, July 1, 2023

How to Make Money with Stock Market Investing

In this blog, you will come to know about tips and tricks about investing in stock market. You have to assume that in the worst case, you may lose out all the amount of money or the corpus of money which you are putting in the stock market. So, if you are ready to take that kind of risk, then invariably, you will land into few opportunities which will give you disproportionate gains.

So, following are my simple rules for investing in stock market:

Try to invest in two to three companies every six months and do not take out your money after suffering initial losses. Try to keep invested in the stock market for good amount of time.

Set your targets before investing. Let us say if you are investing in Company A, so your target set is that you want to gain 30 percent from this company in one month. Suppose that within only two days, the company rises by 30 percent, but you should not be greedy and just take out that money. Your target was to achieve either minus 5 percent of loss in this company or thirty percent in two years. So, In the year time frame, whenever the company goes below minus five percent, you have to sell or whenever the company goes above 30 percent, you have to sell.

Therefore, try to keep a lower and upper cap for your investment criteria for every stock. Once you do that, you will always be a rule-based investor or trader and then it would not be difficult for you to overcome your behavioural tendencies.

Stock market investing is easy, but it is the people who invest make it difficult and it is the behavioural aspects of a person which makes investing quite difficult in nature. The idea here is that a part of it is an art, it is not a very hard science. So, you need to learn, how to master this art of investing.

To master the exit point is decided by your behavioural tendencies. So, I would suggest you to read a bit of economics, behavioural finance, psychology and investment behaviours and then understand how to get over this.

Whenever you invest in a company, try to first look out for P/E ratios. When you look for P/E ratios, what happens is that whichever company has a higher P/E ratio than their peers, might be a high growth company, or there is some kind of overpricing in the company, or there is some kind of manipulations in the company’s share. So, try to avoid such companies which have more P/E ratio than their peers.

Dividend Yield

Try to look for companies which give predictable dividends. If a company is giving you a predictable dividend, you can earn money without even exiting the market. So, keep invested in a company which is giving you a consistent dividend. There is also a downside to those dividend companies, but still, the downside is not that great because if you are getting money on a regular basis, you also get appreciation of stock market and also get the dividend income.

Current Debt

If their current debt which can be found in the balance sheet of the company, is of the same order as their current cash, then that company has a reasonable health.

EBITDA If EBITDA and profits are very different from each other like half of each other, then you need to see that why the EBITDA is so different than the net profits.

Hence, there are many rules for investing in stock market. Once you start searching how to invest, then you can go down the rabbit hole of investing. But the key idea here is that you need to learn continuously or else you will keep finding investing as gambling.