Dataiku has gotten quite a bit of press recently, including an interesting article entitled The Photoshop for Data Scientists. This blog post is an explanation of why we believe this metaphor fits… and why it doesn’t.
A majority of designer and photographer’s go-to tool when it comes to image editing software is Photoshop (and Illustrator) because it encompasses the many tools a person needs to get from basic snapshot to publishable photography.
Our vision for Data Science Studio (DSS) is similar but for data scientists and analysts. We work hard to answer the daily needs of data scientists in one tool. In fact, we are striving for DSS to become every analyst’s, whether beginner or advanced, go-to tool when it comes to getting their data-driven projects done and deployed in good time and in high quality.
Also, data science is a relatively new area of expertise, not yet fully understood (or known) by a majority of people. The Photoshop metaphor is useful because it uses a widely understood concept to explain one that few people really grasp today.
Building something relevant with data is... painful!
First of all, finding a data scientist is hard because data science remains a skill for experts only.
Second, once a company has found one (usually a really smart PHD in machine learning whose salary is usually pretty high):
- the data scientist will spend approximately 80% of his time being a data cleaner (ie cleaning, contextualizing, and enriching data)
- only 20% of his time putting his genius to good use, i.e. creating machine learning algorithms and models to uncover industrial and commercial value.
Too many tools, not enough productivity
The tools – yes, that’s a plural – the “data scientist expert” uses are not always compatible with each other causing process flow problems, the repetition of tasks that could otherwise be automatized, and miscommunication between business, IT, and engineering departments. Basically, the whole process is counter-productive – even for experts - , time consuming, and thus expensive for the companies trying to put them in place.
Similarly, before software suites like Photoshop came out, photographers used multiple tools to get “raw” images to look spick and span – especially in commercial industries such as fashion. Then came the Creative Suite (and other image editing softwares) in which the necessary steps for image editing merged into one tool. This was a huge step in making the whole process from raw image to this-is-going-to-be-a-magazin's-cover-picture less tedious and much faster.
One tool for experts... and beginners
Analogously to these all-in-one image editing softwares, Dataiku’s purpose is to make its users that much more productive on a day-to-day basis. Imagine the possibilities if you could reverse the existing 80-20 rule by increasing every data analyst's productivity - whether they be in business, hard-core data science, or programming! That’s a whole lot of brainpower put to good use. And a whole lot of value creation in much less time.
Furthermore, the Creative Suite is not only an image editing software for experts. It is also a great learning tool for amateurs. Professionals can spend hours editing a picture pixel by pixel, whereas beginners can use it to slightly alter images with filters or other click-n-go features. Granted it is not the easiest software to use, but it is definitely accessible. And, the more users use it, the better they become at it.
At Dataiku, we believe that the penury of super powered data scientists should not be a barrier to production – in fact, we believe that with the appropriate tools, there should be no such thing as a penury of data scientists at all.
That’s why DSS is accessible to beginners and non-professionals as well. Thanks to DSS tutorials, videos, and demo missions in the free Community Edition, anyone with a slight interest in the matter can learn, apply, and improve his or her skills to become the data scientist that companies drastically need today.
But, metaphors aren't perfect...
The Photoshop metaphor has its shortcomings as well. First of all, Photoshop is not the most user-friendly tool when it comes to getting more in depth image editing tasks done. Moreover, the final product that comes out of using Photoshop is a static, finished image. When using DSS, the final product is a continuously working data application, a production oriented work and dataflow. There is nothing static about it.
Oh well, we’ll just remember that a metaphor is merely the suggestion of resemblance and not supposed to be understood as the actual thing.
Have more questions about Dataiku Data Science Studio?