Powering Efficient Data Prep & Effective MLOps: Dataiku + Snowflake

Scaling AI Joy Looney

How can your organization practically and pervasively deploy innovative technology while still maintaining control? When you break it down, the answer is actually fairly simple: gaining topic literacy and choosing the right platform. Enhancing your technology stack while tapping into your preexisting knowledge base and unfettering expert talent to work on high-value projects will give your organization the flexibility and competitive edge needed to emerge ahead. But first, scaling AI is essentially impossible without establishing a foundation and then thoroughly articulating the concepts that engender responsible and replicable data and machine learning (ML) processes. 

Dataiku is designed with key data preparation and MLOps capabilities dedicated to these critical concepts and crafted for seamless integration with other vital players like Snowflake. Your ML initiatives can be launched, maintained, and grown easier than ever before. 

Elastically Scale Your Data Prep and Model Inference With Snowflake 

Kyle Berry, Solutions Engineer at Dataiku, was joined in this Everyday AI Chicago conference session by Jennifer Wylie, Senior Sales Engineer at Snowflake. In the session, they explained how Snowflake’s innovative cloud architecture can be used to execute SQL and Python data transformations with a performance engine operating out of Dataiku’s single environment. 

66% of data scientists spend their time preparing their data.” - Jennifer Wylie

Snowflake’s goal is to minimize time wasted on data wrangling and cleaning and turn data preparation into an easy and fast mechanism, freeing data scientists to work on more value-add projects. So, how is Snowflake going after this goal alongside Dataiku? 

In Wylie’s words, “Snowflake makes Dataiku sing.” Dataiku and Snowflake are complementary solutions that combine Snowflake’s highly scalable computational power and processing flexibility with Dataiku’s ML and model management capabilities. 

You’re able to kick-start your projects, bringing in any type of data (structured, semi-structured, non-structured) via Dataiku’s graphical interface that gives users an intuitive environment to seamlessly load data from AWS and Azure to Snowflake. The combination of Snowflake and Dataiku helps you clean your data quickly and get your data where it needs to be in the format that works best for you. Then, teams can take full advantage of the scalability offered by cloud services with Snowflake, which was built on the cloud for cloud computing, by easily scaling compute clusters.

→ Go Further on Optimized AI Solutions With Snowflake and Dataiku 

Now that you know you can tackle and scale data preparation efficiently with Snowflake and Dataiku, let’s look deeper into the powerful MLOps capabilities of Dataiku. 

What's Your MLOps Strategy? 

Catalina Herrera, Principal Solutions Engineer at Dataiku, reminded us in this Everyday AI Miami conference session that it’s still undeniably difficult to connect people, processes, and technology (especially if you’re working with the wrong platform). This is where MLOps enters the spotlight. However, a poorly managed MLOps strategy leaves opportunities on the table and unheedingly leeches resources. 

By leveraging Dataiku’s robust suite of MLOps features to manage your live models — monitor performance, detect and analyze drift, perform model comparisons, and more — you can mitigate this problem, removing friction from your MLOps strategy to make the most of the resources already at your fingertips. 

Build for a Business Use Case 

ML is a pipeline, and each artifact along the pipeline needs to connect to the root — the business use case you’re solving for. To do this you need to empower domain experts within business teams to work alongside data experts within data science teams in an efficient manner from start to finish. 

You’ve probably heard the old adage, “garbage in is garbage out,” and it’s extremely applicable to ML. If you’re not able to understand and clearly explain the true business impact and importance of the model, that model is ultimately invaluable to an organization. Designing models with KPIs in mind from the start and ensuring proper model validation along the way is critical for long-term success and overall support for ML initiatives. 

Connect Your People 

Realistically, the talent pool within an organization is composed of individuals with many different skill sets and all different kinds of knowledge backgrounds. This diversity of opinion and expertise enriches decision making but also creates the added challenge of having to find a common ground for a unified progression approach. A simple but important example of this is data experts that have different preferred programming languages. Without providing a thread between languages and disparate data sources, harmful silos are formed. You’ll notice that these gaps become more and more exaggerated and harder to bridge as you scale. Dataiku acts as this thread.

With Dataiku’s collaborative platform, you are actually able to consolidate disparate sources, merge work from different languages, and ultimately turn out insights in a centralized, dominantly visual interface — a springboard of sorts that makes data stories easy to consume as well as share. 

Deploy to Production 

With tried and true success of one model, it follows that more and more models are demanded from teams, but to turn out this supply, artifacts must be shared and communication remain consistent and reliable as you scale. Bad communication equals risks that will turn into costs that you don’t want to bear. It’s best to nip the problem in the bud with enhanced explainability and transparency in every stage leading up to deployment, supporting the successful operationalization of your growing number of models. A well-thought-out MLOps strategy puts immense attention on a feedback loop with buy-in from a web of individuals — stakeholders including subject matter experts, technical evangelists, you name it. 

Dataiku provides collaborative tools which support increased visibility at every level. This way, when you’re ready to deploy, everyone already knows what the intended end game is and there should be no resistance to counteract or surprises to scramble around. 

A Recipe for Success 

But, as you know, It’s not just about deployment. Reality is that one ML model is not going to be precise forever. New influential data will trickle (or rush) in, and models will need to be updated accordingly. Additionally, your infrastructure is going to continue to evolve along with your business goals, so having a resilient platform to move from is the best assurance you can have going forward. 

With Dataiku you can make ML consumable, make it reusable, and make it transparent. You can work with the skills you have today knowing that you will be able to reuse that same work with the technology and skills that you could adopt tomorrow. In this way, the platform acts as the future-proofing glue between all of your data and ML processes. Additionally, Dataiku Govern allows you to transition into predictive practice, letting the tech talk to create a core MLOps feedback loop that is entirely infused into an end-to-end framework. 

In terms of enhancing your tech stack, empowering individuals, streamlining communication, and all-in-all optimizing data and ML processes, Dataiku is both the cherry on top and the core ingredient. 

You May Also Like

What Is MLOps?

Read More

4 Do's and Don'ts of Hiring and Upskilling for AI Talent

Read More

Technoslavia: Navigating the Data World in the Age of Generative AI

Read More

IT Leaders: Benefits of Moving to Dataiku for Modern Analytics

Read More