If there’s one thing we’ve learned at Dataiku after talking to thousands of prospects and customers about their data architecture it’s that architecture frameworks tend to be more aspirational than realistic. That's because, at the enterprise level, data architecture is both complex and constantly changing. Nonetheless, the importance of data architecture in the success of AI initiatives at an organization cannot be understated.
From a data architecture perspective, scaling AI across the enterprise — or enabling more and more people to leverage data in their day-to-day work — requires three main things:
- Rethinking efforts to continually centralize.
- Rethinking the role of IT itself more broadly.
- Letting business objectives inform architecture, not the other way around.
This blog post will unpack the first concept in particular and the importance of data architecture that meets the needs of the business.
Benefits of Data Architecture Decentralization
When it comes to data architecture, for the past five to 10 years, centralization has been the name of the game — potentially to a fault. In fact, the overwhelming majority of IT teams and leaders today have probably tried to centralize too much and too many times.
Here at Dataiku, we talk to a lot of teams (generally within large multinational enterprises across a range of industries) about their data architecture. Usually, when the question “how many single source-of-truth data warehouses do you have?” comes up, the answer is not one. It’s two, three, four, and sometimes even more. Sound familiar?
When efforts to centralize have failed over and over (and over and over) again, the answer isn’t to double down on centralization, but that’s often the reality.
Enterprise Data Architecture Pitfalls
Today’s efforts to centralize, especially in the context of data, analytics, and AI initiatives, may sound something like:
- “We need to get governance sorted out before we can start getting value from data and AI initiatives,” or
- “We need to solve data quality before we can start investing in tools to start doing any serious data science, machine learning, or AI projects,” or
- “We just need to finish our cloud migration, then we can start getting return on investment from data.”
Governance, data quality, and the move to the cloud are undoubtedly critical topics. But the point is that while irresistibly tempting to undertake such projects from an IT perspective, more centralization without a larger, use case-based goal or purpose doesn’t actually generate any business value and can often mean that efforts in these areas ultimately fall flat.
How to Build a Data Architecture Fit for AI
It’s nearly impossible to have a conversation about centralized vs. decentralized data architecture without mentioning today’s trendiest term: the data mesh. The concept of the data mesh is less about architecture in the technical sense (while certainly there is something to be said about the tooling possibilities for data mesh architecture, it’s outside the scope of this blog post) and more about data architecture from an organizational point of view.
In a nutshell, the data mesh is about decentralization and business ownership of business data assets. That means instead of central teams like IT controlling the source of truth for data across business lines, that responsibility falls on the business itself.
The advantage of this approach is that it puts the onus on the business to maintain, use, and create value from their data. After all, if IT owns the source of truth, but no one agrees with it, what use is that centralization? Having every department, team, or even individual employees creating (and re-creating) their own “single view” of the customer is inefficient, on top of undermining the work IT puts into centralizing in the first place.
The disadvantage, of course, is that the data mesh approach is extraordinarily challenging to achieve in practice. AI platforms (like Dataiku) can lower the barrier to making this switch and put more ownership in the hands of lines of business. But as always, technology isn’t a magic bullet — the shift is also largely cultural and will take some serious change management.
Data Architecture: Finding the Balance
There is a natural, underlying tension between IT and business, the desire to centralize and to decentralize efforts. However, technology alone can’t (and doesn’t) resolve this tension — what does resolve it is aligning business needs as closely as possible with owners of the data. That means getting domain experts to decide what the data means, who should use it, how it should be used, and more.