Monday, August 22, 2022

Easiest Way to Become a Data Scientist

No matter what so called experts would like to sell, don't fall for it. No online certification or networking is going to get you a Data Science job, if you are in a different industry.
So, you may ask what is the way out?
Yes! there is a smarter way.



Get yourself into a full-time degree. It doesn't have to be specifically in Data Science. You need to find something related to Statistics, Analytics, Computer Science. It could be something like an MBA in Business Analytics or something like a Masters in Statistics. 
This will push you into a structured path, your only effort would be put into getting yourself up-skilled in Data tools & techniques.

At the end of the course, you will know which companies are looking for you. It's like you narrow down your search, the companies find it easier to access talent. Like win-win.

Of Course, no-one-size fits all.   
Also, I have talked about it in a youtube video. Check it out below:

Saturday, March 12, 2022

AI Vs ML Vs Deep Learning Vs Data Science


So, what really is the difference amongst AI, ML, DL & Data Science?

This fundamentally is a marketing question. A better way to circumvent this question, is to define a spectrum for data science approaches.

At one end you will have Statistics, the other end will be occupied by software engineering. See below.


Let me offer an explanation to the above diagram. When you are to the left, you will see a lot of core data approaches, working with spreadsheets, data cleaning & wrangling, plotting histograms & fitting distributions, etc. This is the area where you will need to use your Maths/Stat skills like Hypothesis testing, measures of dispersion, etc. This I call 'Statistical Data Science'.
                                                                                                         As you look towards right, you will find a lot of new age software driven data hungry methods, which are validation metric oriented. 

So, it's like the same kind of solution with varying degree of software-maths-statistics usage. Towards the left, the proportion of Maths/Statistics is more, and the right is tilted towards software engineering.

Let's look at the same thing with a Venn diagram:

Also, there are things like Robotics & AGI which get mixed with Data Science. There is a need to understand that Data Science is largely a statistical discipline and Robotics has more Electronics/Mechanical Engineering.

Also, I have talked about these things in a youtube video. Check it out below:

Solving Business Problems with Data Science

There is no one size fits all solution/framework to convert any business issue to a data science solution. The best, a data science transformation team can manage is lay out an explicit plan of things they are going to undertake from problem to solution.
1. Once, the business analyst understands the pain points, he/she comes up with the business problem (see the chart above)

2. After finalising the business problem, a team of data scientists or senior data scientists, convert it to a tractable data science problem

3. The solution design stage takes into consideration the scope, assumptions & goals of the data science solution

                                 

4. Now, you need to implement it with the help of software engineers, data engineers & junior data scientists

5. To validate the implementation, the senior data scientist does the assessment, if it passes this stage we deploy the solution to business user. If it fails, we need to go back to the drawing board and start from the solution design stage again.

As, you might have understood, this is an iterative process.
You can check the video below:




Friday, March 11, 2022

Wordle Hints with Python


I'm pretty sure you must have tried at least once to play this daily vocabulary puzzle. I prefer to not remember the dictionary and I also don't enjoy coding. Voila!
I married them both, my inadequate vocabulary & apparent lack of interest in coding. Then, we get a union to rescue me everyday with hints for guessing the next word.

Let's talk about the approach. For Python Code, see at the end of the post.
You need to start with a guess. Till this point you just know that it's a five-letter word. Not much of a help.
If you follow 3Blue1Brown, 'SALET' is the best first guess as it gives the most information for first guess.

After, that our Python code does the rest.
We load NLTK library, to get the list of words. From that we refine it to 5-letter words. Further, we need to harmonise the case.

This gives us around 9000 words. 

Then, each guess on the Wordle gives you 3 hints:
1. Is the letter present/absent
2. Is the letter at the correct position
3. Is the letter going to be used just once or more

After, that any person with reasonable coding skill can tame the game.
Here is the code : Github Link and the videos below





PS : Sorry, for spoiling the fun!

Sunday, February 6, 2022

Interest in Amazon

1. Web Search

2. Amazon Share Price Chart


The above charts show the trends/data from the last 90 days. Automatically Updated Daily!

Interest in Facebook (Meta)

1. Web Search

2. Facebook(Meta) Share Price Chart


The above charts show the trends/data from the last 90 days. Automatically Updated Daily!

Saturday, February 5, 2022

Classification of Data Science Problems - A perspective

Data ProblemsWe all want to be an expert in Data Science. So, how do you become one?
Start with at least the knowledge of broad categories of Data Science Problems. No, I'm not talking about the much hyped ML/AI discipline. Let's stick with Data Science.

Number One:
                      Let's say you have historical nos. of any point of interest(Sales/Orders/Temperature/...), and you want to forecast its future value. This my friend falls in the class of Time series forecasting problem. You could use ARMA/ARIMA/SARIMA models or go for LSTM if you like ML models. Of course, the solution set I listed is not exhaustive, you could use many kinds of approaches for Time series forecasting.

Number Two:
                      Now, let's say you have two non-time series features or (variables of interest), and you want to find their association or interrelationship; you can resort to correlation analysis. 

Number Three:
                         Extending the problem category two, let's now move to find the dependence of a variable on set of other features/variables (or just one feature); here you can try any of the regression techniques.

Number Four:
                       Modifying the problem three, let's say you are interested in Yes/No or True/False or Present/Absent kinds of answers, you my friend need any of the classification models. You could take the help of Logistic Regression.

Number Five & Beyond:
                                        Now, that we have covered all usual suspects, we need to discuss the cutting edge ML/AI problems known as RL (Reinforcement Learning) or Unsupervised/Semi-supervised learning methods. These are more fuzzy & unstructured than the previous problems