In my previous post, I discussed realistic, proven, and plausible use cases for generative AI in the banking and financial services industry. In this post, I will outline one of the important non-technical constraints on many use cases: human oversight and how it shapes investment in GenAI within financial service as of January 2025.
As a starting point, we can use the framework from Bonnie Docherty, “Losing Humanity: The Case Against Killer Robots” (Human Rights Watch, 2012). We will modify the concepts somewhat for our context.
human-in-the-loop (HITL) |
a human must instigate the action |
human augmentation |
human-on-the-loop (HOTL) |
a human may cancel an action |
machine maker, human checker |
human-out-of-the-loop (HOOTL) |
no human action is involved |
fully autonomous |
Let’s apply these delineations to a real-world financial service example. Here, we will assume that GenAI augmentation and AI agents are competent and effective but not infallible. The reality of these assumptions will be discussed in future posts.
The Example
Imagine a call center which costs $110 million annually to operate and works at full capacity providing acceptable service to clients. Ten million of this total cost to operate is specifically due to remediating agent errors. This error rate and the mechanisms you have in place to control it are considered acceptable by your regulators and investors. Let’s apply our three scenarios and observe the consequences:
HITL
You provide each call center agent with a GenAI assistant, which helps them rapidly find and validate information, suggests actions to take, and otherwise improves their efficiency. In all cases, the human agent takes the action, and the GenAI agent never interacts directly with the client. Let’s assume this provides a 25% efficiency boost with error costs unchanged.
Break-even ROI: $25M
HOTL
You create a GenAI agent which attempts to assist all clients directly. The GenAI agent directly communicates with the client and acts on your internal systems to solve problems. Every message or action proposed by the agent is first reviewed by a human agent in real time and signed-off on before execution. Because of this level of oversight, efficiency improves moderately by 35%.
Occasionally, a GenAI agent proposes a response to a client that damages your brand credibility, makes an offer that violates fair practice laws, or takes an action in your systems that is particularly messy to clean up, and the human agent does not intervene correctly to prevent it. Though these errors are not fundamentally worse than those created by human agents, client and regulatory response to these errors is harder to predict and tends to be more severe due to the presence of AI. You have made $35M in efficiency savings. You also face previously unknown risks relating to GenAI agents acting erratically with clients or in your systems, in ways different from human agents and thus harder to forecast.
Break-even ROI: $35M + indirect cost of increased risk variance and regulatory scrutiny
HOOTL
You hand most client interactions over to GenAI agents, with oversight performed in batches each night by a human audit team. The result is a 90% efficiency gain. Occasionally, a GenAI agent responds to a client in a way that damages your brand credibility, makes an offer that violates fair practice laws, or takes an action in your systems that is extremely messy to clean up.
On one occasion, a very large error occurs. The error would also have been technically possible under your original, human-only operation, but because it has occurred under an ‘unsupervised AI’ system it faces extreme public and regulatory scrutiny. Since your firm is at the vanguard of such work, there is no precedent for what ‘reasonable controls’ or ‘acceptable mistakes’ looks like. You have made $90M in efficiency savings and have moved your firm into unknown territory.
Final result for your investment: $90M savings + catastrophic direct and indirect remediation costs + you have been fired.
The Result
The safe path, that brings savings without incremental risk, is to adopt human-in-the-loop or human-on-the-loop AI augmentation, often in a two-step process. Other paths are viable but are decided by the firm’s risk tolerance, not the direct savings of the technology itself.
We can represent this shift visually. First, consider a generic representation of the reward-risk tradeoff.
Then, consider a representation which reflects the reality of the industry. Note that, regardless of the upside of full automation, if the maximum tolerated risk level is below the risk posed by the new system, then the investment is not viable.
The Future
Of course, these risk levels are not permanently fixed. One risk reduction path is technologically driven, in which GenAI agents become dramatically less risky than human agents and are controlled such that the maximum long-tail risk cost can be fixed at a level acceptable to firms. The other path of risk reduction is regulatory and social, in which the use of GenAI agents and the acceptability of errors generated by these agents becomes clearly understood by firms, and the controls for their ‘acceptable’ use is clearly defined. In reality, both will occur concurrently.
But today, we exist in a world in which neither holds and thus human-in or on-the-loop is the most reasonable path. The good news is that investments made in GenAI HITL and HOTL processes generate real ROI today and will be valuable for future evolutions into HOTL and HOOTL systems.