The Architecture Behind a High-Impact CoE for AI Agents & Self-Service

In the second installment of our “Building the Next-Gen CoE for the Age of AI Agents” webinar series, Jon Tudor, Director of Business Architecture, explored the foundational data architecture and management strategies needed to scale AI agents and self-service across the enterprise. If your organization is building or refining its CoE, this session delivered a wealth of insights.

The Core Question: How Do You Enable AI & Self-Service at Scale?

The session centered on a challenge many enterprises face: how to simplify access to data and agents, reduce friction, and empower individuals across the business to work with data more easily — and just as critically, to enable agents to do the same. According to Jon, the answer begins with strong data architecture, effective management, and a clear understanding of what’s being built and how it's being used.

To address that, organizations must first evaluate the foundations of their data infrastructure.

1. Building the Foundation: Enterprise Data Architecture

Scalable self-service requires thoughtful design at the architecture level. Jon laid out three key questions every enterprise should be asking:

Enterprise Source of Truth: Do we know where the data is? If we don’t know where the data is, we can’t get to it.
Access & Writeback: Can people and agents access the data — and can they also publish their results back into the system so others can find and reuse them?
Compute at Scale: Can we compute at scale using cloud or distributed resources, or are we still pushing everything through a central server that becomes a bottleneck?

To meet these needs, organizations often adopt one of two models:

Data Lake/Lakehouse: Physically store data in one location and allow work to happen on top of it.
Data Mesh: Virtually connect datasets across domains for a unified enterprise view.

Regardless of which model is used, observability is key: "We need to be able to see all this activity." Once this foundation is in place, organizations can turn to what they build on top of it — data products.

2. Defining Data Products

A shared definition of data products is essential to cross-functional alignment. “If you ask 10 people what a data product is, you’ll get 10 different answers.”

In this context, a data product is a group of related artifacts — datasets, models, agents — built to achieve a particular business outcome. To avoid fragmentation, organizations must align on what qualifies as a data product, define success criteria, and establish ownership from the start.

With definitions aligned, the next step is operational: how do you support data products as they grow and evolve?

3. Maturing & Supporting Data Products

To scale responsibly, organizations can classify data products based on maturity:

Proof-of-Concept: Under five users, often exploratory.
Pilots: Between five and forty-nine users, where automation may begin.
Products: Fifty or more users that require formal IT support.
Critical Products: Mission-critical assets that receive executive-level attention.

Jon emphasized that ownership is essential: "If there's not an owner, it's likely not going to have long-term viability."

Support models and RACI frameworks help clarify who is responsible for maintaining, evolving, and retiring products. Organizations should define whether IT manages only the platform or also individual models — and establish a cadence for recertification and review. With products maturing, secure and scalable access becomes the next consideration.

4. Balancing Data Access & Security

Balancing agility with governance is key to enabling broad, safe access. Jon outlined three types of access control:

Discretionary Access Control (DAC): Request-based access, flexible but can create bottlenecks.
Role-Based Access Control (RBAC): Tied to job roles, effective for standardizing access.
Attribute-Based Access Control (ABAC): Driven by metadata, enabling dynamic policies.

Each has trade-offs. "RBAC and ABAC only models tend to reduce innovation... DAC-only leads to too many requests."

The solution? A blended model — structured enough to maintain trust, flexible enough to avoid friction. Still, even the best access controls won’t matter if users and agents can’t find the right data.

5. Metadata & Cataloging: Making Data Findable

Metadata enables discoverability. Jon emphasized that collecting and managing metadata must be intentional. Organizations can:

Automate collection by scanning environments and capturing lineage
Enforce governance by requiring metadata before publishing
Assign dedicated roles to steward data assets
Gamify contributions to incentivize adoption

Useful metadata includes business-recognizable names, sensitivity classification, lineage and quality, and consumption eligibility (such as whether a dataset is for internal transformation or trusted, published use).

Well-maintained metadata supports discoverability — and sets the stage for responsible agent behavior.

6. Classifying Data for AI Use

To ensure agents (and people) use data appropriately, Jon recommends functionally classifying datasets:

Base Layer: Replicated from source systems
Transform: Temporary, task-specific data
Consumption: Outputs tailored to specific use cases
Published: Trusted, high-quality, reusable data

"Published datasets can help you define the right use data within the enterprise source of truth," he noted. Clear classification enhances trust and sets the foundation for broad reuse.

7. Driving Reuse Across the Enterprise

Promoting reuse starts with visibility: highlighting frequently used assets in wikis, catalogs, or Dataiku homepages. Forming a published dataset council, with cross-departmental representation, helps align standards and increase adoption.

Reusable assets include Dataiku flows, agents, models (in Python or R), dashboards, queries, and pipelines. Reuse reduces redundant work and improves consistency — but only if paired with active governance.

8. Retention & Ownership

Sustainability at scale requires discipline around ownership and retention. "Make sure you're not running things that aren't really driving business value."

Jon recommended:

Flagging stale assets (e.g., six months of inactivity)
Updating ownership regularly, especially after role changes
Decommissioning orphaned artifacts to reduce operational risk

Automated syncs with HR systems and scheduled recertification can help keep everything up to date. The result: cleaner, more reliable systems.

9. Executive Value & Risk Reduction

For senior leaders, the implications of poor data management and architecture foundations are strategic. "If you don't have the right things to get the right answer, you're not going to get there."

The bottom line: without strong architecture, metadata, and access controls, agents can’t function safely — and decisions lose their foundation. Scaling AI requires technical and strategic alignment. And that alignment starts with infrastructure.

Final Thoughts

Wrapping up the session, the message was clear: As AI agents proliferate, success depends on scalable architecture, reusable data products, data management, and organizational alignment.

"The business understands the context of the problems they're trying to solve the most." Your systems need to empower them — securely, scalably, and smartly.

The Architecture Behind a High-Impact CoE for AI Agents & Self-Service

The Core Question: How Do You Enable AI & Self-Service at Scale?

1. Building the Foundation: Enterprise Data Architecture

2. Defining Data Products

3. Maturing & Supporting Data Products

4. Balancing Data Access & Security

5. Metadata & Cataloging: Making Data Findable

6. Classifying Data for AI Use

7. Driving Reuse Across the Enterprise

8. Retention & Ownership

9. Executive Value & Risk Reduction

Final Thoughts

You May Also Like

Introducing Agent Hub: The Workspace for Enterprise Agents

Agent Sprawl Is the New IT Sprawl, Here's How to Control It

The Business Case for MCP

Everything to Know: AI Agents for Supplier Risk Assessment

The Architecture Behind a High-Impact CoE for AI Agents & Self-Service

The Core Question: How Do You Enable AI & Self-Service at Scale?

1. Building the Foundation: Enterprise Data Architecture

2. Defining Data Products

3. Maturing & Supporting Data Products

4. Balancing Data Access & Security

5. Metadata & Cataloging: Making Data Findable

6. Classifying Data for AI Use

7. Driving Reuse Across the Enterprise

8. Retention & Ownership

9. Executive Value & Risk Reduction

Final Thoughts

Building the Architecture & Data Foundations for Self-Service AI Agents

Subscribe to the Dataiku Blog

You May Also Like

Introducing Agent Hub: The Workspace for Enterprise Agents

Agent Sprawl Is the New IT Sprawl, Here's How to Control It

The Business Case for MCP

Everything to Know: AI Agents for Supplier Risk Assessment