Saturday, March 12, 2022
AI Vs ML Vs Deep Learning Vs Data Science
This fundamentally is a marketing question. A better way to circumvent this question, is to define a spectrum for data science approaches.
At one end you will have Statistics, the other end will be occupied by software engineering. See below.
As you look towards right, you will find a lot of new age software driven data hungry methods, which are validation metric oriented.
So, it's like the same kind of solution with varying degree of software-maths-statistics usage. Towards the left, the proportion of Maths/Statistics is more, and the right is tilted towards software engineering.
Let's look at the same thing with a Venn diagram:
Also, there are things like Robotics & AGI which get mixed with Data Science. There is a need to understand that Data Science is largely a statistical discipline and Robotics has more Electronics/Mechanical Engineering.
Also, I have talked about these things in a youtube video. Check it out below:
Solving Business Problems with Data Science
2. After finalising the business problem, a team of data scientists or senior data scientists, convert it to a tractable data science problem
3. The solution design stage takes into consideration the scope, assumptions & goals of the data science solution
4. Now, you need to implement it with the help of software engineers, data engineers & junior data scientists
5. To validate the implementation, the senior data scientist does the assessment, if it passes this stage we deploy the solution to business user. If it fails, we need to go back to the drawing board and start from the solution design stage again.
As, you might have understood, this is an iterative process.
You can check the video below:
Friday, March 11, 2022
Wordle Hints with Python
I'm pretty sure you must have tried at least once to play this daily vocabulary puzzle. I prefer to not remember the dictionary and I also don't enjoy coding. Voila!
I married them both, my inadequate vocabulary & apparent lack of interest in coding. Then, we get a union to rescue me everyday with hints for guessing the next word.
Let's talk about the approach. For Python Code, see at the end of the post.
You need to start with a guess. Till this point you just know that it's a five-letter word. Not much of a help.
If you follow 3Blue1Brown, 'SALET' is the best first guess as it gives the most information for first guess.
After, that our Python code does the rest.
We load NLTK library, to get the list of words. From that we refine it to 5-letter words. Further, we need to harmonise the case.
This gives us around 9000 words.
Then, each guess on the Wordle gives you 3 hints:
1. Is the letter present/absent
2. Is the letter at the correct position
Here is the code : Github Link and the videos below
PS : Sorry, for spoiling the fun!
Sunday, February 6, 2022
Interest in Amazon
The above charts show the trends/data from the last 90 days. Automatically Updated Daily!
Interest in Facebook (Meta)
The above charts show the trends/data from the last 90 days. Automatically Updated Daily!
Saturday, February 5, 2022
Classification of Data Science Problems - A perspective
We all want to be an expert in Data Science. So, how do you become one?
Start with at least the knowledge of broad categories of Data Science Problems. No, I'm not talking about the much hyped ML/AI discipline. Let's stick with Data Science.
Number One:
Let's say you have historical nos. of any point of interest(Sales/Orders/Temperature/...), and you want to forecast its future value. This my friend falls in the class of Time series forecasting problem. You could use ARMA/ARIMA/SARIMA models or go for LSTM if you like ML models. Of course, the solution set I listed is not exhaustive, you could use many kinds of approaches for Time series forecasting.
Number Two:
Now, let's say you have two non-time series features or (variables of interest), and you want to find their association or interrelationship; you can resort to correlation analysis.
Number Three:
Extending the problem category two, let's now move to find the dependence of a variable on set of other features/variables (or just one feature); here you can try any of the regression techniques.
Number Four:
Modifying the problem three, let's say you are interested in Yes/No or True/False or Present/Absent kinds of answers, you my friend need any of the classification models. You could take the help of Logistic Regression.
Number Five & Beyond:
Now, that we have covered all usual suspects, we need to discuss the cutting edge ML/AI problems known as RL (Reinforcement Learning) or Unsupervised/Semi-supervised learning methods. These are more fuzzy & unstructured than the previous problems