How To Come Up With Interesting Data Science Projects

Stop using the Titanic and MNIST datasets. Learn a simple approach to come up with interesting side projects.

How To Come Up With Interesting  Data Science Projects

I always struggled to come up with interesting ML/DS side projects. The problem was that I was doing things backward. I usually started with a solution, an algorithm, and then tried to find a problem, an application, for it.

From what I've seen, this is a very common issue among Data Scientists. See if you can identify yourself here:

  1. You learn about the latest ML/DL algorithm and get all excited about it
  2. You start looking for places where you can apply it
  3. You have a hard time coming up with an original application. So you end up doing the same project everyone else is doing or dropping the project altogether.

When you learn something, it's hard to disregard the context where you learn about it. Instead of finding new applications, we default to applying this new knowledge in a very similar context, if not the same. It's no wonder why so many Data Scientists end up predicting who survived in the Titanic or building a digit recognizer using the MNIST dataset.

The opposite approach works better: start with the problem, a question or application, and then try to find a solution, an algorithm, that'd work for it. Even better, start with a question that genuinely interests you. That way, it'll be easier to find prospects, and you increase your chances of coming up with something that interests others (it's rare to find problems that only draw your attention).

So, how can you do that? Here's a simple way of approaching it:

  1. Choose a question: Come up with a question you could potentially solve with data. Try doing it using a topic that truly interests you. Look at your hobbies, what you follow on social media, or what you read in the news.
  2. Find data: Figure out if there's data you can use to solve that question.
  3. Adjust the question (if necessary): If the data won't help you solve your question, see if a slightly modified question would work
  4. Choose an algorithm: Choose an algorithm you could use to solve that question.
  5. If unhappy, start over: If you are not happy with the results of step 3 or 4, go back to step 1.

That's all. It's simple, yet I've found it to be very effective.

I have taken this approach in my last projects and it has worked quite well. When I launched polituits.com, I got over 300 interactions on LinkedIn, 40 stars on GitHub and hundreds of visits to its website.

Hope you find this useful. Let me know what you think in the comments.

You can follow me here as I continue building data-driven applications.