The collection, analysis, and operationalization of data has never been easier or faster thanks to the modern data stack — today, it’s widely considered the sustainable way for small to midsize companies and teams to become data-driven. However, that doesn’t mean it’s completely without obstacles. Here are three challenges that arise when implementing a modern data stack (and how they can be addressed).
1.Making Data Actionable for Everyone
For small, agile organizations leveraging the modern data stack, it’s not only critical for more traditional roles like data scientists and analysts to be able to interact with data — business people (whether marketing, sales, finance, operations, etc.) must be part of the picture as well.
Many companies successfully build a stack that allows them to go from data sources to extract/load to cloud data warehouse, but then the question is, how do people with different skill sets take data from that data warehouse and actually use it? One option is to add additional tools to the stack for data transformation, data science, business intelligence (BI), and reverse ETL (with the goal of pushing data back into the SaaS tools the business is already using, like Hubspot, Salesforce, Gainsight, etc.).
However, this approach adds a lot of additional tools (read: cost, complexity) to the mix. The alternative is one end-to-end tool that covers all these functions so that all the pieces being built and/or consumed by different users work well together as part of the larger ecosystem. For example, Dataiku Online connects to your data and includes easy-to-use visual data preparation, AutoML, integrated reporting, and so much more — all in one place.
I know, I know — governance is a big, ugly, corporate word that doesn’t often sit well with SMBs or with the idea of the modern data stack, for which a main advantage is light maintenance or expertise needed on the IT side. What we really mean by governance is not needless process or oversight, but having a centralized place to gather all data-related assets. This, in turn, makes maintenance, resilience (backup), and any required oversight easier.
The reality is that SMBs need governance to properly scale operations. Ultimately as data starts to accumulate and transformations, pipelines, machine learning applications, etc., are being created by various teams and people across the organization, knowing who is working with data how, when, and where is critical.
For example, when going from data source to extract/load to cloud data warehouse, suddenly the “transform” of extract, transform, load (ETL) is happening on-demand when the data is being used. Without best practices or technology to facilitate transformation across the many users interacting with data, there can be chaos. With technology like Dataiku connecting all the pieces of the modern data stack end-to-end, people can create datastores and marketplaces with a complete ecosystem of data tools.
While governance is not necessarily a critical component for all types of data projects, for some in particular that are cornerstone in driving value to the company, teams will want to start thinking more about adding expertise and control over the process back into the mix.
That’s not to say organizations need to take away the agility so critical to the modern data stack, but just that it’s important to select tools that can provide frameworks for centralization and maintenance (including MLOps) when and where it makes sense
3. Getting Real Value
With the modern data stack making data simpler (and cheaper) to store plus increasingly accessible than ever, more and more people can leverage data in their day-to-day work. But does that necessarily equate to value generated for the business?
At Dataiku, we’d argue that just building a modern data stack doesn’t generate value in and of itself. It’s still important to create and measure the impact at the business level, getting data back into operational tools that business people are using every single day and operationalizing data projects that can provide long-term impact. For companies that have solved the initial problems of data ingestion and have broken down silos between their data, technology like Dataiku allows them to actually leverage that data and go further in their journey with machine learning.