Good data scientists are hard to find. Even harder to keep. I know this for a fact since hiring / finding them was my job for four years. But, at Dataiku, they stay. Why? That’s what the following is all about.
As a headhunter in the BI and data science world, talent turnover helped me succeed. As you can imagine, talent turnover was also most of my clients’ biggest enemy.
A lot of you have been there I’m sure: finally finding the star data scientist you’ve been seeking, watching this much awaited data scientist transform your raw data into business value, and finally watching him or her fly off to some other, more appealing, venture.
Why does this scenario happen over and over again - especially when it comes to the best data scientists on your team?
At first glance it seems hard to understand. Data scientists have awesome jobs! They are hired to do advanced math, build models, explore data, and find beautiful and new ways to use it for real business applications. Those business applications can range from building predictive services for powerful marketing automation or predictive maintenance, building new services to solves problems related to fraud, to logistics, to high churn rates, and so much more. Plus, the “data scientist” title usually comes with a pretty nice paycheck. Basically, being a data scientist is a dream job… in theory.
So what goes wrong? Why do they leave? I was curious so I decided to ask the protagonists themselves: the data scientists.
They told me the problem is the gap between that promising job offer and the reality of their day to day work.
The truth is that a data scientist spends most of his time dealing with dirty data. Cleaning and preparing the data takes a lot of time, often upwards of 80% of an entire project… And it’s not very interesting. The usual data science tools and languages that they have to work with aren’t adapted. As a result, getting to data you can actually work with to build predictive solutions ends up requiring hours of work and huge amounts of code. And even before data cleanliness becomes an issue, they have to figure out how to access and bring together all their data from different sources in one place! The newest big data technologies are very powerful, but they’re also difficult to combine and to use efficiently together.
Data scientists also end up feeling a little lonely. Even if they work in a data lab with other data scientists around, they all work separately on different bits of projects, sometimes even in different languages. Let’s just say communication isn’t a central aspect of the projects they work on. Data scientists also need to work with IT departments and business teams in order to connect different data sources and deliver tangible solutions to business problems.
Getting IT, business, and data scientists to work together is a task in and of itself. Some speak Excel, others speak Python, and others speak plain old English. Because of this, the data scientist can often spend time developing a great model that will never get deployed in production and whose business results will never see the light of day. So, the data scientist feels undervalued, slightly bored, often frustrated, and ends up leaving, hoping the next company will be different.
I kept hearing about these stories at work. A few months ago, as I was talking to one of my clients about this problem, he stopped me and said: “Lara, this isn’t good for your job but I’ve got to admit that I’ve finally found the solution to our data scientist turnover problem. It’s called Data Science Studio."
Needless to say, I was intrigued. When I got home, I decided to download DSS - just to check it out. Of course, I thought it was brilliant (which is why I soon applied to work for the company behind the solution, Dataiku). Even I could understand it and see how useful it was, and I didn’t know the first thing about data wrangling.
All in all, my first experience with DSS was very positive. I won’t lie - I didn’t build anything fancy. But I definitely recognized the ease of use and intuitive aspects of the UI.
In August, as I joined the Dataiku team as a business developer, I quickly learned that in three years of existence, Dataiku had never lost a data scientist. Ok, like my friend said, DSS is awesome. Sure, that explains some of it. But I felt that I was still missing a part of the story.
That’s when I decided to conduct a little internal survey and find out why they all stayed. They all told me it was because DSS made the annoying part of the job faster and more efficient, and allowed them to work together on projects. They also mentioned the fact that the kitchen was always stocked with goodies a plenty for them to munch on. But that’s besides the point.
Here is a little recap of the questions I asked and the answers I got: What are the features of DSS that make your work as a Dataiku data scientist so appealing?
So here for you today are the answers I got.
The top 16 Data Science Studio features, ranked
- 1. Training machine learning models quickly
- 2. Running several models simultaneously with a couple of clicks and benchmarking them easily with lots of indicators
- 3. Super easy data cleaning with preparation scripts (parsing dates, normalizing text, doing mass actions with the visual interface…)
- 4. Visualizing projects and rebuilding them easily, manually or automatically, with the DSS Flow
- 5. Using lots of different languages and technologies and coding with notebooks for each of them (R, SparkR, SQL/Hive/Impala, Python/PySpark, Pig, Shell languages and so much more)
- 6. Direct and easy connection to any database or outside source
- 7. Easy import of data with format detection
- 8. Web apps!
- 9. Pretty charts, quickly!
- 10. Manipulating and visualizing geographic data
- 11. Simple data partitions
- 12. Visual SQL recipes for click and drag data cleaning
- 13. Enriching data in the visual interface in just a few clicks
- 14. Web log parsing
- 15. Code snippets
- 16. Custom recipes
Of course, I also put each data scientist’s answers in DSS and I’m training a predictive model on the data to create a data scientist recommendation engine. I’ll tell you more about that in a future blogpost…
I know what you’re thinking. How is it possible to have all of these great features in one tool?!! And you’re probably pretty skeptical because I mentioned I’m a salesperson as well… Good for you, you should never trust anything you read on the Internet! You should go and see for yourself :)