The need to be understood is not only a core human trait, but it's also an important part of a data scientist's responsibilities. Models do not exist in a vacuum, and “analytics products” have no intrinsic value on their own — their purpose and potential are only fulfilled when they are consumed by people and applied in organizations.
It can often be challenging to reach the adoption stage, for both technical and non-technical reasons: MLOps tools and practices have flourished in recent years to tackle the former, whilst the ideas behind explainability and Responsible AI have loomed large to address the latter. Fortunately, there are a multitude of options available with Dataiku for data scientists to exploit to enhance understanding, disseminate outputs, and realize value quickly and easily, by fostering adoption across their organizations.
There are many ways to define, break down, and categorize what constitutes “consumable AI” but, for the sake of this discussion, let’s consider consumption from the perspective of three different archetypes which likely exist in your organization. Those include the team or group of individuals responsible for developing, validating, and signing off your work; the originally intended end users/“customers” (can be internal or external, i.e., primary beneficiaries of your work); and the secondary (perhaps even tertiary!) stakeholders that benefit by reusing assets or building more advanced applications or services on top of your core efforts.
Consumption by the Development and Validation Teams
Trust and understanding of modeling outputs are critical to any form of adoption in the enterprise, so naturally they must first be consumed by the teams developing, validating, and governing the AI initiatives themselves. At the most primitive level, they may gain a sense of the input datasets, processing steps, and outputs of a Dataiku project by exploring the Flow, and the documentation and metadata associated with each object.
For those interested in validating modeling artifacts developed using Visual ML, they may take advantage of the model specific interpretations, partial dependence plots, individual prediction explanations, subpopulation, model fairness and model error analyses. A key added benefit of developing with Visual ML is that all of these interpretability and explainability capabilities (which help describe how the data has been prepared, the model has been trained, and how it’s performing) can be exported via the model document generator for an instant, up-to-date, and comprehensive report offline.
Consumption by the End User
Once there’s consensus and approval around the work that’s been done, the focus can shift to batch, real-time, and interactive consumption. For projects which are point-in-time in nature — i.e., analyses which have been conducted to make specific business decisions — models, predictions, and their explanations can be exposed and interacted with via Dashboards. This provides a “What-if?” capability, allowing consumers to interrogate results, and tweak assumptions to predict how circumstances could unfold under different scenarios. For more scheduled, low-touch, and operational objectives, Dataiku’s Scenario Orchestration facility can be employed to execute pipelines at predetermined cadences.
These can be endlessly customized to deliver predictions in batches, in a tailored fashion with easily configurable status reporting to the most popular communication channels (email, Slack, Teams, etc.), with metrics and checks to rest assured that if there’s any issue, the relevant teams would be informed, and able to debug immediately. Conversely for real-time, event-driven use cases, models, custom predictions, and other functions can be easily surfaced, deployed, and managed as endpoints with the API Deployer — ideal for integrating and consuming your Dataiku projects from third-party applications.
Consumption at Large
Self-service, reuse, and extension are widely seen as the pinnacles of a truly successful AI initiative due to their longevity and widespread adoption. Again, Dataiku users are fortunate to have Dataiku Applications, Webapps, and collaboration capabilities on hand to support them on this journey. Dataiku Applications offer a light-touch solution for data scientists to package their work for customizable reuse, and end users don’t need to write a line of code. These can be great for situations where repetitive analyses and dashboards are requested from analytics teams, by allowing the business users to be self-sufficient and fulfill their own requests on demand.
While Dataiku Applications cover a wide range of common usage patterns, the sky's the limit when embedding your AI into Shiny, Bokeh, or Dash Webapps. While these frameworks incur more sophisticated development expertise and maintenance overhead, they enable complex use cases to be delivered, mixing dashboarding, batch and real-time outputs, and third-party applications at once, in a single window. Finally, we mustn’t forget a core strength of the Dataiku platform — collaboration; from shareable projects, datasets, wikis and the data catalog, it’s straightforward to empower other data scientists, developers, and business users across the organization to benefit from your work. A common example we see is the reuse of carefully processed information as Feature Stores, by sharing and identifying curated datasets as such in the catalog, colleagues can discover features of interest and take advantage of them for offline enrichment or even online serving!
There are undoubtedly other avenues for consumption that I’ve missed, but hopefully you’ve taken away why making data science work accessible is important, the usual suspects to consider when thinking about the consumption of your work, and how easy it all can be when working with Dataiku. Why not try one of the approaches you’ve learned about today to expand the reach and impact of a recent project?