Responsible and Explainable AI Along the ML Pipeline With Dataiku

Dataiku Product Marie Merveilleux du Vignaux

A key challenge on the journey to Enterprise AI is transitioning to more responsible and trustworthy AI systems. During the 2021 Dataiku Product Days, Triveni Gandhi, Dataiku senior industry data scientist, and Du Phan, Dataiku research scientist, shared some key features of Dataiku that users can lean on to successfully make the shift.

Why Is Responsible and Explainable AI Important?

Think about a data science model in the wild. There are numerous people who are going to interact with it, from data scientists to business leaders to actual users and people who are directly or indirectly affected by AI systems. With so many different points of engagement, how can builders and users of AI trust these systems? As more cases of biased and unfair models come to light, interpretability and transparency are necessary to build trust in the models and pipelines that affect millions of lives.

The central idea of Responsible AI is all about establishing trust amongst all parties.

  • The data scientists who build the models want to make sure that the models conform to their behavior expectations.
  • The stakeholders of the company who use the model to make decisions want to be assured that the model is making correct outputs without needing to understand the deep mathematical details behind them.
  • The public, or those who are affected directly or indirectly by the models’ outcomes, want to be assured that they are not being treated unfairly.

To achieve this high level of trustworthiness, it is necessary for machine learning systems to have high levels of interpretability. A robust machining pipeline nowadays, needs to provide not only predictions, but also enough context to information that allows humans to gain enough understanding of the model, and from there to build trust into the system.


→ Download the Ebook Black-Box vs. Explainable AI: How to Reduce Business Risk  and Infuse Transparency

What Interpretability Means at Different Stages of a Pipeline and How Dataiku Can Help

Let’s go over each phase of the machine learning pipeline, associated questions, and how Dataiku can help answer these questions.

1. Data Wrangling and Processing

Are there any biases in my input dataset? Dataiku offers interactive statistics to allow you to examine the bias in the raw data.

 

2. Model Building and Training

In which situations does the model perform poorly? The new model error analysis features of Dataiku 9 can allow you to see where the errors are coming from.

Does the behavior of the model conform with domain knowledge? Machine Learning assertions and new features have been added to allow users to make sure that the module conforms to the expected behavior based on domain knowledge. ML assertions are checks that help you to systematically verify whether your model predictions align with the experience of your domain experts.

Why has a particular prediction been made? Individual prediction explanations can help you take a closer look at one single conclusion and understand why the model is making that decision. During scoring, in both batch and real-time, prediction explanations can be returned as part of the response, which fulfills the need to have reason codes in regulated industries and provides additional information for analysis.

Are the predictions made by the model fair towards all populations? Model fairness reports can give users an idea of what the fairness bias in their model looks like. The Model Fairness Plugin provides a dashboard of key model fairness metrics so you can compare how models treat members of different groups, and identify problem areas to rectify.

3. Model Deployment and Monitoring

How do changes to the inputs actually affect the predictions later on? Dataiku’s what-if analysis allows data scientists and analysts to check different input scenarios and publish the what-if analysis for business users with interactive scoring. With what-if analysis accessible to business users, they can build trust in predictive models as they see the results they generate in common scenarios and test new scenarios.

Where is information on the entire process stored? Dataiku’s model document generator assures a transparency report process. It will generate a Microsoft Word file that provide information regarding:

  • What the model does
  • How the model was built (algorithms, features, processing, …)
  • How the model was tuned
  • What are the model’s performances
  • It allows you to prove that you followed industry best practices to build your model

Conclusion

These are only some of the many critical capabilities for explainable AI that Dataiku provides. Together, these techniques can help explain how a model makes decisions and enable data scientists and key stakeholders to understand the factors influencing model predictions.

It’s important to remember that there is no silver bullet and that this work toward more responsible systems needs constant evaluation. However, Dataiku is here to help you speed up this process every step of the way.

You May Also Like

5 New Dataiku Features to Streamline Your RAG Pipelines

Read More

Dataiku Is a Gartner Peer Insights Customers’ Choice

Read More

Keep Track of All Your Models (Including LLMs) With Dataiku

Read More

AI Isn't Just for the Super Technical

Read More