In the ongoing conversation around Responsible AI in practice, you may have heard about shifting away from opaque or black box models to transparent or white box models. Some key concerns about black box models include:
- They do not provide global variable importance, so data scientists cannot assess how potentially sensitive variables impact the model’s performance
- Low interpretability means that end users of the model must take the outputs as a given, without an opportunity to interrogate the algorithm
- The results of black box models are often presented without an opportunity to appeal or react to the algorithm’s decision, making the experience uni-directional to those most affected by the model
When confronting these concerns, some elect to use fewer deep learning or neural network models in favor of more explainable algorithms such as random forests and decision trees. But what if your use case involves classifying unstructured data, or a neural network model more accurately predicts cases of fraud than a decision tree? Can you make use of these algorithms and build responsible pipelines?
I would argue that the answer is yes. Responsible AI is a methodology that extends beyond any one algorithm and is a practice that can (and should!) be implemented when designing any AI product. While black box models lack inherent explainability, they can still be held to the same rigor and standards as any other model, especially with the robust set of Responsible AI tools that Dataiku incorporates across the board.
Start With the Data
As in any use case, good quality and well-sampled data is the foundation of a responsible and governed AI pipeline. When preparing unstructured data for a new deep learning project, be sure to measure how well the sample represents the population that the model will be used on. For instance, if you are building a model to understand and classify unstructured text in support tickets, you can use Dataiku’s native statistics tools to assess overall distributions, look for relationships between variables of interest, and check for proxies that might skew your final model results. Even in the case of unstructured data such as images or audio files, it is imperative to make sure the training data is relevant for your use case and is reflective of the real-world images the final model will come across once deployed.
Next Up: Modeling
It’s true that black box models are inherently unexplainable — meaning it’s difficult to understand how the variables themselves impact the model outputs. However, that doesn’t mean responsible outcomes are impossible to achieve, especially in the model design and testing phase. When building and experimenting with a classification model, you will want to maximize accuracy and robustness, which diagnostics tools and model assertions can help optimize for. However, these should not be the only performance indicators used to build a responsible black box model. In this case, it is imperative to make use of custom metrics to evaluate model performance, such as key fairness criteria, depending on the potential outcomes and areas of concern.
When determining which model to use in production, the visual ML review in Dataiku also provides opportunities to measure how well the model performs across subgroups of interest, partial dependence plots to measure variable influence, and neighborhood explorations which gives insight into how changes in model inputs affect predictions. This, in turn, gives you an opportunity to explore what-if analysis on even opaque neural networks.
Monitoring and Feedback
Finally, when building a robust and fair model, you want to make sure it remains that way over time. Even if a model cannot provide global explanations for how it is built, Dataiku’s visual ML provides insight into row-level explanations using Shapley or ICE. This can help your end users make sense of why an algorithm made a certain decision in a specific case and give insight into potential areas of remediation to change the model outcome.
Speaking of changing models, you can use metrics and checks to ensure that your model stays fair or equitable in its performance even as it is deployed in real time. Custom monitoring tools and stress tests ensure your model does not lose performance on the Responsible AI metrics that are most important in your use case. Additionally, individual level explanations can be outputted via the Dataiku API, so even real-time scoring can benefit from transparency into how a model made a specific decision.
A critical component of any Responsible AI pipeline is ensuring that end users have a chance to react to model predictions and provide feedback on overall performance. For example, when outputting product recommendations to clients, a simple question like “Are these recommendations relevant to you?” give the data team a great deal of information on how well the model is doing in the real world. In other cases, you may want to provide opportunity for end users to write long-form responses to the model decision, or have clearly listed how they can appeal the results of a decision in a transparent manner. Even if the model used in the pipeline is opaque or black box, the way in which it is deployed and presented to users doesn’t have to be!
While these steps are important in creating a Responsible AI system, they are not the only requirements for safely scaling Everyday AI. An established governance protocol (like with Govern) as well the use of best practices in MLOps can help you ensure that even the most opaque black box models can be regulated and built in a reliable manner.