Wednesday, February 8, 2023

3 Tips for Finding your Data Science Project Idea

Projects are an essential part of learning data science. And, deciding on a topic for a project is a tough nut to crack. Here I have shared my experience & a three-step formula to find a personal project idea.

If you have decided to work on a project, think like a doctor. What does that mean? 

Think about what are the things that can improve your day-to-day life with the help of data. You'll come across a lot of problems. And one such problem is your data science project.

Think of a problem you might be facing that can somehow be connected with data. Now ask the question, can you solve it with data?

If your answer is no. You can learn & revisit more topics in Data Science.

If your answer is yes, it means you have hit the right target and you are thinking in a suitable direction.
If still, you are not sure how you'll solve the problem i.e. what data-driven approach should be followed to solve the problem statement, you can follow the following 3 steps technique after figuring out the problem statement 


1) Approach: You have to figure out what kind of method you should prefer to solve the problem. Does it require machine learning, mathematical programming, mathematical analysis or something more advanced

2) Data: Next step is to find the relevant data according to your problem. Is it available on the internet, or do you need to scrape it? Is the data structured or unstructured, and how do you clean and preprocess the data? These are some questions that you need to ask yourself to get quality data for your problem statement.

3) Result: Once you get the data and you have applied the chosen approach, it's time to present the solution to the general audience. Writing a detailed report of your findings is the best way to present your project. The report also helps the key persons to understand your project without going through each line of code.


This 3 step technique does wonders not only for personal projects but also for professional projects.

Let's understand by an example how to use this 3 step approach to a given problem statement.

Problem: How to increase subscribers of a YouTube channel?
(Disclaimer:
The youtube algorithm is far more advanced than the solution presented here. The solution is just to understand the strategy)

1. Approach: This problem may require machine learning techniques like Regression, Random forest, or advanced techniques like Neural Networks.

2. Data: There are many ways to get the data of a YouTube channel. We can use the YouTube API, or we can ask the owner of channel.

3. Result: The way we present the result is the most important thing. Saying something like, "make quality content for a youtube channel" as a result might not be an effective answer to the given problem statement. 


The result should be specific, actionable, and personalised. Mentioning something like, "Posting twice a week, replying to all comments, uploading videos in the evening, etc are the key insights from the data & analysis, that have increased the engagement & retention in the past, so following these tips will help in gaining subscribers.'' will be very effective.


So, the overall summary for finding a suitable data science project is:
Look for problems - Can the problems be solved with the help of data? - Solve them using 3 step technique (Approach - Data - Result)


Data Engineer vs Data Analyst vs Data Scientist - A Practical Comparison & Perspective!

There are various Data roles like Data Scientists, Data Engineers, Data Analysts, etc. Having a clear difference amongst these roles is important, so that you can select the most appropriate path for your journey. Below is a brief description & differences amongst the three roles: Data Engineer, Data Scientist, and Data Analyst.

Data Engineer: Data Engineer is a person who is given the task to get and handle the data. Data engineers are well acquainted with coding and algorithms. They use these algorithms for data cleaning and data handling.  

In current times, data is considered a valuable asset, so data engineers are responsible for maintaining the entire data architecture and data pipeline for an organisation. They handle raw & unstructured data and convert it into a usable format so that the data can be made available for further analysis to Data Scientists and Data Analysts. 

Data Engineers are not directly involved in the decision-making of a business. They work as a backend for the entire data team and indirectly help in data-driven decisions. Tools like SQL, MongoDB, Python, etc. are used by Data Engineers.

Data Analyst: Data Analyst is the person next in the pipeline of a data science project. They receive data from data engineers and perform analysis like EDA or any kind of elementary analysis. They analyse the structured data to find useful insights.

Data analysts use descriptive and inferential statistics for data analysis. Finding KPIs, and preparing reports are some of the day-to-day work of data analysts. They understand the current situation of an organisation and suggest recommendations for improvement.

The work of data analysts impacts the business directly. They suggest basic data-driven solutions that might be valuable for an organisation. 

Data analysts use spreadsheet tools like excel and google sheets and dashboarding tools like Rshiny, Power BI and Tableau. Sometimes programming languages like Python are also used for data analytics. SQL is also widely used by Data Analysts.

Data Scientist: Data scientists are the key assets for all the data-related activities in an organisation. They are responsible for all the model development & deployment, checking the performance of models in production, and enhancing the existing model. 

Data scientists possess the knowledge and understanding of Maths & Statistics, Programming Languages, and Machine Learning, as their task requires the use of all these three components. 

Data scientists handle semi-structured or structured data and perform data preprocessing. Various ML models are applied according to the requirements and problem statement. They maintain the accuracy & performance of models that are part of a data science project.       

The work of data scientists is helpful for businesses in predicting future events. Data scientists are directly involved in business decision making.

Data scientists extensively use programming languages like Python, R or SAS. (recently, Julia!)

Data scientists also perform elementary tasks like EDA, which are also done by Data Analysts. 

Sometimes the roles of Data Analyst and Data Scientist are overlapping. In many firms, the Data Analyst can do the work of Data Scientists and vice versa. It depends on the company & its requirements.

But this is not the case for Data engineers. Data engineers can be thought of as Software Engineers whose task is completely different from others.

So this is all about the role and responsibilities of Data Scientists, Data Analysts, and Data Engineers. Hope! now you have a clear understanding of each role.