Dataiku, as a proud Microsoft partner, can now leverage data in OneLake through Microsoft Fabric. Microsoft Fabric is a powerful new player in the analytics and AI space. It brings together many existing Microsoft capabilities — Azure Data Factory, Azure Synapse Analytics, Power BI, and others — into a single centralized interface backed by Microsoft OneLake. OneLake is intended to be a centralized storage area and compute engine for all of an organization’s data. Once data is inside OneLake, the many different Microsoft products that are components of Fabric can leverage it to create insights and data products.
So how does it all work? In this blog post, we’ll look at a use case and see how Dataiku connected to OneLake data through a Fabric Data Warehouse can supercharge your analytics teams. You can view the Dataiku User Docs here.
Let's See It in Practice
In this example, we are members of a financial analyst team tasked with creating models to forecast new investments in sales and marketing for a multinational enterprise.
First, we need to establish a connection to a Fabric Data Warehouse in Dataiku. This will allow us to read and write datasets in Fabric from Dataiku’s intuitive user interface, while leveraging Fabric compute. The underlying data will reside in OneLake.
Data Connection: An admin can establish a connection to a Fabric Data Warehouse through Dataiku’s data connections. This typically involves configuring access credentials and selecting the appropriate data storage options. Because these connections will use passthrough Microsoft credentials, all data access restrictions are preserved, and data stays in OneLake.
Once a connection has been established, users can begin working with Fabric.
Data Access and Management: Once connected, users can easily access datasets stored in OneLake. Dataiku’s interface allows for data exploration, transformation, and visualization.
To begin, a data engineering user would like to load some financial data sitting in S3 into OneLake for processing. This can be done without code by using a sync recipe in the Dataiku flow. No matter the size of the data, Dataiku will automatically set up a fast path for easy data movement.
Once a user puts their necessary data into Onelake, they can begin using Dataiku visual recipes to join and prepare that data for construction of a machine learning (ML) model. Here, we can see that we’ve synced two datasets and joined them. Since the data was synced inside of a Fabric Data Warehouse, we were able to leverage Fabric compute to perform the join — no matter the size or original storage location of the data.
The engineer has moved the data over, and now she’s going to pass it off to an analyst to clean it up. The analyst uses a prepare recipe to clean the data using Dataiku’s built-in AI assistant, AI Prepare. Once again, all compute and datasets are pushed down to Fabric and OneLake.
Finally, now that the analyst has correctly prepared the data, the flow looks like this:
She’ll pass it off to the data scientist to build a new forecasting model, using Dataiku’s visual ML features. He trains, tests, and evaluates this model in Dataiku, without the need to write any code.
The data scientist creates a successful model and deploys it after several iterations:
Finally, he runs the model against new data to create predictions for the next eight quarters. He creates a scenario so that this prediction can be updated as new data comes in.
Finally, a GenAI team has been tasked with making these predictions available to a wider audience. Using Dataiku Answers and the new Fabric integration, they can once again use the same no-code interface to create a chatbot backed by any Fabric dataset (in this case, the predictions forecasted by the data scientist) and an Azure OpenAI GPT-4o model, that can be queried in plain English text:
Putting It All Together
The powerful synergy between Dataiku and Microsoft Fabric empowers organizations to seamlessly harness their data for advanced analytics and AI. By integrating these cutting-edge technologies, teams can efficiently forecast investments, leverage a no-code approach that enhances collaboration, and speed up the analytics workflow.
From data connection to model deployment, each step is streamlined, allowing users to focus on insights rather than technical complexities. Dataiku’s integration with Fabric and OneLake is available for all customers, check out our free trial offering today!