Demystifying the Modern Data Stack

Data Basics, Scaling AI Lynn Heidmann

If you’ve looked into cloud-native data solutions over the past few years, you’ve no doubt heard of the modern data stack. So what is a modern data stack exactly? Put simply, it’s a suite of tools that makes for easier collection, operationalization, and analysis of data. Here are key things you need to understand about the state of the modern data stack today before diving in to build or implement one at your organization. 

→ Download the Ebook: 3 Keys to Modern Data Fundamentals

Challenges the Modern Data Stack Can Solve

Many companies are still trying to solve the same data challenges they have faced for years. These challenges are also a major issue for large organizations today:

  • The need to break down silos.
  • The need to make data available to a larger set of users. 

While the problems haven't changed, what has changed recently (and what the modern data stack addresses) is the nature of upstream data: five years ago, it was PostgreSQL. Today, data comes largely from software as a service (SaaS) products — whether that’s customer relationship management (CRM) platforms like Salesforce, marketing platforms like HubSpot, or any number of other products used across various teams at the company. Not to mention, the rise of third-party data outside of the company has become more critical than ever for business success. OK, but why does this shift matter?

For one, it means that the incredible increase in the volume (and unpredictability) of data can cause service interruptions or slowdowns with traditional infrastructure, requiring new types of elasticity. Also, traditional extract, transform, load (ETL) tools that plug into classic data sources struggle to connect to these new and emerging data sources. Tools like Fivetran address this challenge by taking data from SaaS tools and putting it into cloud data warehouses (e.g., Snowflake) so that people can actually use that data to run analyses. And speaking of tools … 

Ease on the IT Side is Fundamental for the Modern Data Stack

Some of the key buzzwords associated with the modern data stack are managed, serverless, and low-technical expertise required. In a traditional data warehouse or data lake setup, every time you wanted to increase your storage, you had to increase your compute. It was therefore important to do any data transformation (i.e., the “T” in ETL) before so as not to increase storage costs, which meant hiring data engineers to build complex pipelines and using data build tools.

Because storage and compute are independent in the modern data stack (and because cloud data warehouses can store massive amounts of data for cheap), data transformation can be done more on-demand, which places less of a burden on IT.

On the other hand, it does bring its own set of challenges. Namely, the question of governance — what does it look like in the context of the modern data stack? If data transformation falls increasingly on analysts or even business users, what does that process look like and how can you be sure it doesn’t cause chaos (or inefficiencies, with people transforming the same data over and over)?

the modern data stack in the ai era

An example of tools used in the modern data stack 

The Way Different People Want to Consume Data Has Become More Complex

So when it comes to the modern data stack, the challenges are the same, the data itself is different, and ease is paramount. This all builds up into how people actually consume data day-to-day. 

Say your marketing team needs to analyze data coming from both Salesforce and HubSpot. The company has invested in Fivetran, so without having to hire data engineers to do all the ETL and maintain data pipelines, data from both tools is being successfully extracted to Snowflake, where the data can then be used. But how? 

Business users can leverage classic business intelligence (BI) tools for analysis, but what happens when they (inevitably) want to take that analysis a step further, for example, applying machine learning (ML)? Or, more commonly, what happens when the marketing team just wants to keep using the tools they already know — i.e., Salesforce and HubSpot?

Today’s organizations need a scalable way to:

1. Allow data experts to do advanced data science on top of cloud data warehouses (including pushing down data processing tasks but also having the ability to operationalize data science projects quickly, to be leveraged by consumers on the business side) and

2. Allow domain experts (like analysts) to do advanced data work and

3. Push the results of multi-tool analysis back to the SaaS tools business users are leveraging.

In other words, the modern data stack is about providing a seamless experience for all users, no matter what their data needs are.

Dataiku for the Modern Data Stack

Dataiku can help create that seamless experience by making it easier for anyone from analysts to data experts, to domain experts to collaborate. Dataiku connects to your data and includes easy-to-use visual data preparation, automated machine learning (AutoML), and integrated reporting and visualization dashboards for your business — all in one place.

An example of a visual Dataiku flow tagged by contributor type.

An example of a visual Dataiku flow tagged by contributor type

But it doesn’t stop there. Dataiku can facilitate building reverse ETL components, feeding data back into operational tools. It can also bridge the gap from data scientists and other data experts to data consumers, allowing for the operationalization of AI projects and applications for use among a wider audience (more on this in the next section).

What’s more, because all of these components — from data science to BI and visualization to reverse ETL — are happening in one tool, the pieces play nice together.

The Modern Data Stack for Everyone

If agility is the name of the game when it comes to the modern data stack, Dataiku is the perfect fit because it’s a tool for everyone (literally). From full-code for data scientists, engineers, architects, and more to no- or low-code for analysts and business experts, Dataiku is the central tool for all data efforts.

Since ease on the IT side is also fundamental, having a place where everyone on the team or in the organization can work instead of having to invest in different tools for different people and then figure out how they all need to work together is critical.

Fully Managed Option With Dataiku Cloud

Dataiku can be leveraged as a software layer in your own cloud. But for teams or organizations looking for a fully managed option, Dataiku Cloud is built for the modern data stack — managed, serverless, and low technical expertise required when it comes to maintenance (administration and upgrades). 

The bottom line when it comes to the modern data stack, and to Dataiku, is flexibility. Organizations need to invest in architecture that will work today and with whatever technology is popular five years from now, or even further in the future. 

When thinking about the future, scale matters — and Dataiku is ready to scale with the success of your company. From startups to proven solutions tried, tested, and adopted by large, multinational enterprises, the platform has more than 45,000 users worldwide and is a recognized leader in the space.

You May Also Like

Data Science & AI Operationalization: Keys for Execution

Read More

What Is a Machine Learning Model?

Read More