How a Data Science Company Keeps its Data Scientists Happy

Dataiku Company Lara Khanafer

Good data scientists are hard to find and even harder to keep. A lot of you have been there I’m sure: finally finding the star data scientist you’ve been seeking, watching this much awaited data scientist transform your raw data into business value, and finally watching him or her fly off to some other, more appealing, venture.

Why does this scenario happen over and over again - especially when it comes to the best data scientists on your team?

At first glance it seems hard to understand. Data scientists have awesome jobs! They are hired to do advanced math, build models, explore data, and find beautiful and new ways to use it for real business applications. Those business applications can range from building predictive services for powerful marketing automation or predictive maintenance, building new services to solves problems related to fraud, logistics, high churn rates, and so much more. Plus, the “data scientist” title usually comes with a pretty nice paycheck. Basically, being a data scientist is a dream job… in theory.

So Why Do Data Scientists Leave Their Jobs?

I was curious about the answer to this question, so I decided to ask the protagonists themselves: the data scientists. They told me the problem is:

The gap between that promising job offer and the reality of their day-to-day work.

The truth is that a data scientist spends most of his time dealing with dirty data. Cleaning and preparing the data takes a lot of time, often upwards of 80% of an entire project - and it’s not very interesting. The usual data science tools and languages that they have to work with aren’t adapted. As a result, getting to data you can actually work with to build predictive solutions ends up requiring hours of work and huge amounts of code.

Even before data cleanliness becomes an issue, data scientists have to figure out how to access and bring together all their data from different sources in one place! The newest big data technologies are very powerful, but they’re also difficult to combine and to use efficiently together.

Collaboration and operationalization are easier said than done.

Data scientists also end up feeling a little lonely. Even if they work in a data lab with other data scientists around, they all work separately on different bits of projects, sometimes even in different languages. Let’s just say communication isn’t a central aspect of the projects they work on. Data scientists also need to work with IT departments and business teams in order to connect different data sources and deliver tangible solutions to business problems.

Getting IT, business, and data scientists to work together is a task in and of itself. Some speak Excel, others speak Python, and others speak plain old English. Because of this, the data scientist can often spend time developing a great model that will never get deployed in production and whose business results will never see the light of day. So, the data scientist feels undervalued, slightly bored, often frustrated, and ends up leaving, hoping the next company will be different.

gif of an Unhappy data scientist at work

How Dataiku Can Help

After my previous job as a headhunter in the BI and data science world, talent turnover helped me succeed. As you can imagine, talent turnover was also most of my clients’ biggest enemy. However, in August, as I joined the Dataiku team as a business developer, I quickly learned that in three years of existence, Dataiku had never lost a data scientist. Ok, like my friend said, Dataiku is an awesome product. Sure, that explains some of it. But I felt that I was still missing a part of the story.

That’s when I decided to conduct a little internal survey and find out why they all stayed. They all told me it was because Dataiku made the annoying part of the job faster and more efficient and allowed them to work together on projects.

What are the features of Dataiku that make your work as a Dataiku data scientist so appealing?

  1. Training machine learning models quickly
  2. Running several models simultaneously with a couple of clicks and benchmarking them easily with lots of indicators
  3. Super easy data cleaning with preparation scripts (parsing dates, normalizing text, doing mass actions with the visual interface…)
  4. Visualizing projects and rebuilding them easily, manually or automatically, with the DSS Flow
  5. Using lots of different languages and technologies and coding with notebooks for each of them (R, SparkR, SQL/Hive/Impala, Python/PySpark, Pig, Shell languages and so much more)
  6. Direct and easy connection to any database or outside source
  7. Easy import of data with format detection
  8. Web apps!
  9. Pretty charts, quickly!
  10. Manipulating and visualizing geographic data
  11. Simple data partitions
  12. Visual SQL recipes for click and drag data cleaning
  13. Enriching data in the visual interface in just a few clicks
  14. Web log parsing
  15. Code snippets
  16. Custom recipes

Of course, I also put each data scientist’s answers in DSS and I’m training a predictive model on the data to create a data scientist recommendation engine. I’ll tell you more about that in a future blogpost…

I know what you’re thinking. How is it possible to have all of these great features in one tool?!! And you’re probably pretty skeptical because I mentioned I’m a salesperson as well… Good for you, you should never trust anything you read on the Internet! You should go and see for yourself.

You May Also Like

Enhancing Speed to Market in Life Sciences Operations

Read More

Solving the Ocean Plastic Pollution Problem With Data

Read More

AI&Us: Revolutionizing the Life Sciences Industry Through Data

Read More