The Statistical Dharma: Honey of Knowledge

Showing posts with label Honey of Knowledge. Show all posts

Saturday, January 21, 2023

How to Learn Data Science Smartly in 2023

Data science is considered the fastest-growing field in current times. Many professionals & students are currently interested in transitioning to this domain. However, learning and moving into a new profession is challenging. It requires structured steps and a solid plan to efficiently crack the domain. So, here we have presented a detailed roadmap that will help you accomplish your goal.

Step 1: Start with any spreadsheet tool like excel or google sheets. Carry out data manipulation, draw graphs and try to find insights from any dataset of your choice.

Step 2: Move to any programming language, be it R or Python. The task is to perform the same analysis in R or Python that you did in the spreadsheet tool. You'll come across some libraries like dplyr, ggplot2, etc. in R and Numpy, Pandas, etc. in python. These libraries will help you in data analysis.

To understand and master these libraries for data analysis, look over the internet, and you'll find a lot of tutorials for the same. Pick any one or at maximum two resources and start learning & implementing. In this process, you'll learn programming language as well as data analysis.

The best way to utilize the maximum from the above two steps is to always ask a lot of questions from the data. Then try to discover the answers with the help of excel, R and Python. This way, you'll not only learn the tool but would develop analytical thinking too.

Step 3: Now start studying statistics. Topics like conditional probability and Bayes theorem should be focused on. Then move to probability distribution, hypothesis testing, and statistical tests. The trick to master statistics is, first try to grasp basic ideas of multiple topics and then start solving the numerical problems. Then implement the learnings like probability distribution, hypothesis testing, and statistical tests in Excel and any programming language.

Congratulations! you have completed 50% of the journey and you are ahead of most of the beginner aspiring data scientists.

Step 4: Now comes the machine learning part. You’ll come across jargons like supervised and unsupervised learning, EDA, data preprocessing, and so on. But, don't get disheartened so easily. Start exploring why there is a such classification of topics, and what steps should be followed. Initially, don't try to understand everything, try to get an idea of the bare essentials. There are plenty of ideal resources out there for machine learning. Stick to a few of them.

Reaching this stage will roughly take anywhere between 3-4 months to 1 year. Now you are ready to work on quality and end-to-end projects. You can apply for internships or even jobs. If you want to study further, you can pursue higher studies in a good institute for data science.

Never get too hung up on completing the topics. Try to understand the why, when, and how of everything you are learning. The reason is, if you'll try to finish things in a short timeline, sooner or later you'll face issues in understanding the fundamentals of topics and you'll feel the need to revisit the topics. Hence, learn slowly but consistently.

It is said that "Little strokes fell great oaks"

All the best for your journey.

Sunday, January 15, 2023

Which One is Better for Data Science : IIT or ISI | Indian Statistical Institute Vs Indian Institute of Technology

Data Science, which is termed as the sexiest job of the 21st century has gained a lot of traction and eyeballs in the last few years. More and more people are trying to enter this field. And, to cope with the supply & demand, various institutes have started offering multiple programs related to Data Science, Machine Learning, or Artificial Intelligence. Two such premier institutes are IIT and ISI. These are known for their elite pedigree. But which one to prefer over the another? There are multiple deciding factors like location of the Institution, type of crowd, exposure, etc. We have focused on the curriculum/subjects, as it is the most important deciding factor.

Roughly speaking, we can divide the whole Data Science & Machine Learning work into two parts: Tech-focused and Statistics centric.

Tech-focused: This part of Data Science involves dealing with large sets of data, finding patterns in data using Machine Learning, AI models, and deploying the model in production. Coding or Programming is the key thing with an understanding of technology.

Examples include Advance Machine Learning, Deep Learning like Neutral Networks, etc.

Statistics Centric: This part of Data Science involves more of the explainable models, where most of the things depend on the parameters of the models. It involves the estimates of parameters which explain the complete model. Statistics & Maths are the key areas.

Examples include linear regression, statistical inference, time series forecasting, etc.

Disclaimer: Both ways are chosen individually or simultaneously for the given problem in different industries and different use cases.

Do you enjoy engineering more, or do you want theoretical studies more? That is the one question that should help you decide above anything else.

IITs are known to focus on engineering & technology whereas ISI has primarily a theme of Statistics.

If working in technology and coding appeals to you, then you can prefer IITs for Data Science. IITs are the best place to learn, work and implement technology with the brightest minds.

If you have an interest in numbers, and you love mathematics and its application in the real world, then ISI is for you. ISI is the best place to understand the Statistics & Maths behind Machine Learning. You get a chance to understand explainable ML, and analysis part of Data Science & how it can help the business.

ISI primarily focuses on traditional statistics & maths. Coding, deployment, and application are the added layers. The situation is the other way round in IITs.

It's not like IITians never learn mathematics or statistics and ISI students never do coding. It's about the curriculum, environment, and culture that differentiates.

Data Science is always a combination of Statistics, Programming, and Maths. Different institutes have different routes for this journey, giving priority to one section over another. At last, it boils down to your choice & preference.

Still, if you are confused now, the thing is you won't go wrong with either institute as both IITs and ISI are wonderful places to become Data scientists.

Friday, June 12, 2020

Economic Modelling & Game Theory of Political Satirists - Content Creator Equilibrium

Journalists/Comedians and in general any Content Creator these days can be segmented into either of the following two categories. First, believe in targeting popular ideas/personalities and divide people along political cleavage. Second type of creators, steer away from such controversial issues; This type believe in creating 'Neutral Content'.

Are Content Creators out of ideas or they are just doing something strategically optimal? Just to be loud & clear, by Content Creators I mean Journalists, Op-ed Columnists, Comedians, Singers,etc.

Unsuspecting people might have a strong opinion that the content produced by the Content Creators just have a bit of moralistic fervour but they are just good content made by artistic choice and not a result of some cunning assembly-line thinking of content production.

Can we develop any kind of Mathematical or Economic model to explain this separation of Content Creators along this theme?

As a student of economics, I believe that the subject has best analytical tools to explain any behaviour and not give some predictive insights as most people would like to believe. So, here is my explanation or rather a simplified model to show why do Journalists & Comedians (Content Creators) target popular ideas/beliefs and extremely popular public personalities.

We don't need to go far. John Nash has given us a versatile structure to address these types of strategic interactions. Let's try to modify the strategic or normal-form game table

Why do Journalists & Comedians target popular ideas/beliefs/figures? - A game theoretic point of view

Before you try to wrap your head around this intimidating looking table, I have a video in my Youtube Channel which you can watch below:

If you have understood what I was trying to show on my youtube video, then you don't need to read further. Thanks! But, if you need more fillers to understand the model in detail, let me start my exposition. (I'll also be attaching my slides at the bottom of this post)

Content Creator has two strategies , 'Attack the Popular' and 'Be Neutral'. And, Society has two strategic sentiment choices, it can choose to 'Ignore the Creator' or 'Give Attention'. The relevant payoff or notional value is given in the table above, of the form (Creator's Value, Society's Value)

What could be the equilibrium in such a situation? We cannot claim this equilibrium to be a Nash Equilibrium, in the real sense of the term, as we are trying to engage in a very different environment than n-player simultaneous move game.

In order for you to understand the payoff structure well, assume that the society is composed of two types of people. One group is attached to the 'popular idea' being attacked and the other group which is not bothered. So, if the Content Creator attacks the popular idea and society chooses to ignore the Creator & doesn't give him enough publicity & marketing, society has a negative value because one group gets negative payoff and the other group gets nothing. Now, when society responds to Creator's attack with attention, society gets 0 as half of the people get some value and the remaining get negative. (Oversimplification Alert!)

Similarly, you can figure out the rest of the table.

Let's jump to the substance of this analysis.

Consider the case when h > 2 (as in the digram above)

This would imply that 600/h < 300. This has an implication on the outcome of the model. If you look at the cell containing (300,0) or when society responds to Creator's attack with attention , you can very well reason that it is an equilibrium, you can call it Nash equilibrium but it won't be technically correct to say so. Given that the Creator is going to attack the 'popular idea', society is better off with giving him/her attention because if society unilaterally moves to ignoring the Creator, they will receive a negative payoff of -300. Similarly, given that society is giving the Creator the attention he/she craves for, Creator gets 300 and if he resorts to producing 'Neutral Content' instead, he will get only 600/h < 300. So, he also has no profitable unilateral deviation. Hence, this is an equilibrium. This is generally observed when the Creator is upcoming and not so famous or produces relatively substandard content.

Consider the case when h < 2 (as in the diagram above)

This would imply that 600/h > 300. This means that the Creator is having an incentive to move away from content which sparks outrage to producing 'Neutral Content'. Production of 'Neutral Content' becomes a 'Dominant Strategy' for the Creator. Now, it depends upon the value of K, which you can see in the image above, how the society is going to react to Creator's 'Neutral Content'.

To put it briefly,

- K>600/h means society remembers that the Creator has a history of producing content which was attacking in nature(attacking the popular idea...) and dislikes the Creator giving us an equilibrium outcome of Creator being 'Neutral' but still gets 'Ignored'

- K<600/h means that society respects & enjoys the Content created by the 'Neutral Content Creator' and we get the best type of win-win equilibrium with payoff equivalent to (600/h, 600/h).

Summary:

This model predicts that new 'Content Creators' who want to establish themselves, resort to attacking popular ideas & personalities and society gives them attention; this is an equilibrium. Also, a Creator who is liked by the society has a dominant strategy to produce 'Neutral Content'. This further implies that when a Creator starts getting some fame he/she has a 'Dominant Strategy' to quickly pivot from provocative/sensitive content to 'Neutral Content', as persistence of attack is suboptimal strategy.

Consider subscribing to my Youtube Channel (Click Here to Subscribe)

Find the slides here : (Starts from the 4th slide)