Achieving fairness in one’s Everyday AI practice is easier said than done. The challenge begins with wrangling the term itself: what is fairness? Many of us might be inclined to say that it’s hard to define, but that we know it when we see it.
We at Dataiku like to be as precise as possible. And the same is true for Luke Vilain, Data Ethics Senior Manager at Lloyds Banking Group: Vilain spoke at length on the topic of fairness, what it is and how it can be achieved, at our 2022 Everyday AI Conference in London. You might say that applying fairness in practice boils down to a comprehensive Responsible AI strategy. In Vilain’s own terms, “fairness is a socio-technological problem” with many different and equally essential aspects.
In this blog, we’ll cover the highlights from Vilain’s talk in London and, in the process, develop a nuanced picture of the shades of fairness in AI, the obstacles to achieving it, and the mindset that makes it possible.
What is Fairness, Anyway?
At the most basic level, fairness is about ensuring that one person or group of people isn’t disadvantaged more than another for reasons having to do with who that person (or group) is. In other words, it’s about stamping out all forms of discrimination. As Vilain puts it, “we can define things as discriminatory when [outcomes] are different for similar groups of people based on unrelated criteria.”
What criteria does he mean, exactly? At Lloyd’s, Vilain’s team has developed a tool that they use to help teams perform fairness and explainability. The tool tracks thirteen different categories according to which it is illegal to discriminate between people or groups, including gender, sexuality, age, race, marital status, and whether someone has had a baby before, etc.
Criteria are considered “unrelated” when they have no relevance to the decision being made by the algorithm or analyst. For example, Vilain said, “if you had one group of people given a loan and another group of people not given a loan, and the only difference between them was a category like, e.g. race, that is unrelated to why one should get a loan, then you can say that that system is biased or discriminatory.”
Vilain was careful to clarify that by “bias” he meant, specifically, discriminatory bias. As many data scientists, engineers, and analysts know, all machine learning models are built on bias—otherwise, they don’t work. Fairness is about ensuring that the biases in play are just, and therefore not discriminatory.
Fairness vs. Ethics
Vilain made the important point that fairness and ethics are not necessarily the same thing. It’s nice when they align, but it’s essential to see the difference between them to handle the many situations in which they do not. Fairness, in brief, is about balance, whereas ethics is about morality and notions of the Good.
Let’s take an example to make this more concrete. Let’s imagine that someone stole your apple. You now have the opportunity to steal their orange. Should you? We might say that it would be unethical to steal that person’s orange; two wrongs, after all, don’t make a right. But it would nevertheless be more fair if you stole the orange. Now, no one is advocating theft—least of all Vilain—but the point is exaggerated to mark the distinction between the two.
To focus on fairness, rather than ethics, is to focus on problems that can be effectively and tangibly solved at present — that is, on solutions within one’s power to achieve in the short term.
Developing a Fairness-Centered Perspective
Introducing robust fairness practices into your data and AI processes begins with a shift in perspective. Vilain identified a few key areas toward which fairness-minded managers and analysts should focus their attention.
The samples that we use to train our models need to reflect the needs of the people who are ultimately served by those models. This means ensuring that no unfair biases are introduced into the samples. Take the case of facial recognition algorithms: though they can be highly accurate for white people, research has found conclusively that they perform poorly with people of color. One of the reasons for this is that the models were trained on those it ended up serving best, thereby proving disadvantageous to everyone else.
A central component of training models is the labeling process: observing the model’s inputs and manually editing the first N number of them to help improve the model’s labeling accuracy.
As with sample bias, if the team training the model has a homogeneous worldview on the categories it is using, you might be missing out on important, diverging perspectives.
To take a simple example, one team training a model on whether something is “soup” or “not soup” might not think to differentiate stews from soups, whereas a team with a more diversified perspective might. The second team would, as a consequence, have a more nuanced and discerning labeling system.
“How you design features, how you treat null values, how you treat averages, how you treat outliers,” Vilain pointed out, are all steps in the pipeline to building your model. At every stage, there are biases in the brain of the designer that will leave their mark on the process. The outcome therefore always has a bias, even if it’s not a historical bias. Understanding this is the first step to determining all the stages at which bias is introduced, and to intervening where possible and necessary to ensure that the bias that emerges at the end is not an unfair one.
The major lesson underlying all of the above is clear: the more diverse you can make the team that designs your algorithms and pipelines, the better equipped you’ll be to head off unfair biases when it comes to sampling, labeling, and developing your pipeline. “It's important to try and get something that will work better for a wider range of people,” Vilain said. And diversity is the cornerstone of that effort.
Fairness as Practical Application
Developing the right mindset around fairness is only half the battle. The other challenge, just as important, is applying fairness practically to your teams and processes. “You can't apply fairness unless you understand what things are gonna get in your way,” Vilain said.
Fairness in practice has to navigate several kinds of constraints, not the least of which are legal restrictions on what kind of data can be gathered and fed to models. Proxies are often used for this reason. But how to figure out what kind of proxy will work?
Vilain pointed out that, in the EU, it has been illegal for automobile insurers to base insurance quotes off of the applicant’s gender. This presented a problem for insurers, who knew from their data that, given accident rates, men should be issued higher premiums than women. Without this differentiator, insurers had to resort to some far-fetched proxy data points, such as the color of the driver’s car. Similarly, a bank, which might not be allowed to collect data about your marital status, can easily develop proxy metrics to determine whether or not a client is married.
Finding the right proxies is an important part of the equation. But it also presents a risk: if AI teams aren’t careful, they’ll lose track of what, exactly, their proxies are standing in for, and this makes their models difficult to explain and can potentially lead to undesirable or unfair outcomes.
Making Fairness the Norm
Once you know what the conceptual and practical dimensions of fairness are, you’ll want to know how to turn fairness into an everyday habit for your data and AI teams. As Vilain laid out toward the end of his talk, there are four approaches to this.
1. Introducing Standards
Setting standards is all about setting expectations. Find what’s already in place and emphasize how it can be iterated, expanded, made better.
2. Introducing Processes
You want your processes to be clear, simple, and clean. Make sure that team members and managers can grasp them easily and intuitively — you don’t want it to take three days of training simply to learn a new process. A streamlined system will ensure adoption.
3. Introducing Oversight
“It is important to have an appropriate level of risk oversight,” Vilain said, “but I think it's also important to recognize that asking data science teams a hundred questions about fairness when the oversight teams don't necessarily know.”
He was articulating a problem common to teams that lack proper AI Governance structures. Platforms like the one offered by Dataiku make it easy for teams to keep close track of processes without having to babysit them; automated alerts, set up according to the specifications required by the team, keep a close watch on complex projects as they are developed across several teams. This allows managers and directors to keep track of developing biases and to ensure that they don’t tip into unfairness.
Governance is not only about developing processes, but also about maintaining good outputs once they’ve gone live. “Once you're actually in operation,” Vilain pointed out, “you need to have a mechanism which keeps an established look over things.” That way, you avoid feeding bad outputs back into the system, and can correct for errors quickly and efficiently.
4. Uplifting Capability
Companies need to invest in fairness and explainability, plain and simple, Vilain said. It requires both analysts and technical engineers to work together toward a common goal. “It's really important that companies build a center of excellence.”
Fairness is Bold
Fairness is an effort that requires not only persistence, but boldness. That’s because the solutions which help you in the short term are not always those that will help you in the long term.
If you know that your training data isn’t rich enough to train the model in such a way as to produce the desired results in the long term, then you might need to take the bold step of altering that data to force that training to occur. In many cases, this will be met with hostility by your peers and colleagues. But, equipped with the right framework and value propositions, you can make them see what you see: that fairness is for the future.