Demystifying the Modern Data Stack

Data Basics Lynn Heidmann

If you’re looking to leverage data at a small or midsize business (or even in a smaller business unit or a larger enterprise), you’ve no doubt heard of the modern data stack — a suite of tools or a pipeline that makes for easier collection, operationalization, and analysis of data. Here are four things you need to understand about the state of the modern data stack today before diving in to build or implement one at your organization. 

→ See Why Dataiku Online Is Built for The Modern Cloud Data Stack

The Data Ecosystem Has Fundamentally Changed (But the Challenges Haven’t)

The challenges many SMBs are trying to solve with the modern data stack are the same data challenges we’ve seen for years (they are also, notably, some of the challenges still plaguing large organizations to this day):

  • The need to break down silos.
  • The need to make data available to a larger set of users. 

While the problems haven't changed, what has changed recently (and what the modern data stack addresses) is the nature of upstream data: five years ago, it was PostgreSQL. Today, data comes largely from software as a service (SaaS) products — whether that’s customer relationship management (CRM) platforms like Salesforce, marketing platforms like HubSpot, or any number of other products used across various teams at the company. OK, but why does this shift matter?

For one, it means that traditional extract, transform, load (ETL) tools that plug into classic data sources struggle to connect to these new and emerging data sources. Tools like Fivetran address this challenge by taking data from SaaS tools and putting it into cloud data warehouses (e.g., Snowflake) so that people can actually use that data to run analyses. And speaking of tools … 

Ease on the IT Side is Fundamental

Some of the key buzzwords associated with the modern data stack are managed, serverless, and low-technical expertise required. In a traditional data warehouse or data lake setup, every time you wanted to increase your storage, you had to increase your computer. It was therefore important to do any data transformation (i.e., the “T” in ETL) before so as not to increase storage costs, which meant hiring data engineers to build complex pipelines and using data build tool (DBT).

Because storage and compute are independent in the modern data stack (and because cloud data warehouses can store massive amounts of data for cheap), data transformation can be done more on-demand, which places less of a burden on IT.

On the other hand, it does bring its own set of challenges. Namely, the question of governance — what does it look like in the context of the modern data stack? If data transformation falls increasingly on analysts or even business users, what does that process look like and how can you be sure it doesn’t cause chaos (or inefficiencies, with people transforming the same data over and over)?

the modern data stack in the ai era

The Way Different People Want to Consume Data Has Become More Complex

So when it comes to the modern data stack, the challenges are the same, the data itself is different, and ease is paramount. This all builds up into how people actually consume data day-to-day.

Say your marketing team needs to analyze data coming from both Salesforce and Hubspot. The company has invested in Fivetran, so without having to hire data engineers to do all the ETL and maintain data pipelines, data from both tools is being successfully extracted to Snowflake, where the data can then be used. But how? 

Business users can leverage classic business intelligence (BI) tools for analysis, but what happens when they (inevitably) want to take that analysis a step further, for example, applying machine learning? Or, more commonly, what happens when the marketing team just wants to keep using the tools they already know — i.e., Salesforce and Hubspot?

Today’s organizations need a scalable way to:

  1. Allow coders to do advanced data science on top of cloud data warehouses (including pushing down data processing tasks but also having the ability to operationalize data science projects quickly, to be leveraged by consumers on the business side) and
  2. Allow non-coders (like analysts) to do advanced data work and
  3. Push the results of multi-tool analysis back to the SaaS tools business users are leveraging.

In other words, the modern data stack is about providing a seamless experience for all users, no matter what their data needs are.

Flexible Pricing Is Appealing, but Can Also Become a Challenge

One of the most appealing features of the modern data stack is the flexible pricing — between usage-based pricing and free trials of SaaS products, experimenting and building out use cases is very easy from a setup and from a cost barrier standpoint. 

That said, usage-based pricing works well if you connect that usage to value. However, if organizations struggle to quantify value from data initiatives overall or from specific data projects, they might see their bills increase faster than the value they’re getting from using the products and services. In this situation, teams might need to reevaluate how they’re using the modern data stack and what kinds of advanced analytics use cases it’s powering (including whether those are the right use cases).

The Bottom Line

The modern data stack is a sustainable way for small companies and teams to become data-driven. One thing we always say at Dataiku, though, is that technology isn’t a magic bullet — even the modern data stack won’t solve every problem and magically allow organizations to become more intelligent through data. There are still certainly some inherent challenges, and it also takes the right mix of people and processes to make the difference (though the right technology definitely helps!).

You May Also Like

Bring Value to Your Team by Improving the Data Preparation Process

Read More

Data Management as a Team Sport

Read More

3 Trends for AI in Retail & CPG for 2023

Read More