Is Dataiku for Data Scientists?

Dataiku Product Marie Merveilleux du Vignaux

In one word, yes! Since its founding in 2013, Dataiku was built by data scientists and for data scientists. While Dataiku is also a tool for analysts and those that prefer visual interfaces, the platform still offers multiple features and capabilities for data scientists. During the 2021 Product Days, Conor Jensen, Dataiku’s VP of Data Science, Americas, detailed some of these features and how they benefit data scientists.

Benefit 1: Coding in Dataiku

In Dataiku, you can use your favorite IDE like PyCharm, Sublime, RStudio, or VSCode to do code development, while still keeping your work synced with the overall project and the Dataiku platform. Dataiku really allows you to code just about anything using the packages and libraries you prefer, and nearly all data transformations and pipelining tasks can be performed programmatically, without needing to use the visual tools if you don't want to. Although, you might find that you do, as it makes certain tasks a lot easier and faster.

You can also choose to use Dataiku to complement your coding skills. You can code the pieces that you need to and let the platform handle the rest.

Benefit 2: Collaboration and Communication

As a data scientist working in an enterprise, you don't know what you don't know, but there are other people in your organization who have knowledge and skills that might enhance your projects. You need ways to collaborate with people in your business to build successful data science projects. Even better, you could even get those people to actually help do part of the work on the project itself!

For example, you could have the business team merge and clean data for you. They have the knowledge of what the data should look like, they have context on what the different fields and data sources represent, and they understand what transformations are typically needed to make the data ready for business insights. You can create a dialogue with the people in your organization about the project and have them be hands-on in the project, making it more likely that your project will be successful.

Benefit 3: Easy Deployment of Data Science Projects

Dataiku makes it easy to deploy your data science projects. The architecture on which you build your project in Dataiku is the exact same framework that you can use to deploy that project. This makes your project much easier to deploy in batch or real-time. It also means it's less likely to break, because you're not migrating between platforms or code languages. And if it does break, the IT team who owns the production project is a lot more likely to be able to fix it in a timely manner, because they have full visibility into the development pipeline, logic, and project history.

Benefit 4: Access to Compute Resources

Dataiku gives you access to the compute resources that you need for data processing and model training — wherever your data exists and whatever your architecture is. You can run analytic workloads on premises, on the cloud, using distributed engines like Spark, or containerization technologies like Kubernetes. Dataiku has built-in push-down capabilities, which means when you have big jobs that can't (or shouldn’t) run in memory, you can push them down to a remote infrastructure for processing, whether that be the SQL database where the data lives or an elastic Kubernetes cluster on the cloud. This allows you to take full advantage of your existing infrastructure, run jobs faster, and scale to bigger data. 

Benefit 5: Safeguards and Alerts

Data scientists want to be called by IT when there is something they need to be involved in. Dataiku allows you to set the guardrails for your projects to make sure that they throw the alerts you would be concerned about. This way you're informed when a project that you were involved in has issues or when it comes to you, you have the metrics and checks built in to tell you what's going on. This will save you tons of time and make you far more productive and happy as a data scientist. 

The benefits outlined here only scratch the surface on the many capabilities Dataiku has specifically for data scientists and other technical experts, so be sure to give Dataiku a try to uncover more!

