Why we need to change how we think about IRRBB data

The Bottom Line

IRRBB has long treated data as transactions, positions, market rates, behavioural assumptions, and risk outputs. That definition worked when the end consumer was a human analyst who could stitch institutional context together from methodology papers, policy notes, committee minutes, and personal memory before signing a number off. AI makes the gap visible, but it is not the cause. Every reporting cycle, someone has to mentally assemble methodology, policy interpretation, active controls, data-quality state and known limitations to form a judgement on whether the number can stand. The fix is to widen the definition of data: regulation is data, policy interpretation is data, methodology is data, procedure is data, model limitations are data, active controls are data. Versioned, traceable, and available on demand to any legitimate consumer, human or machine.

In 2026, the definition of data most banks have used for twenty years is no longer wide enough.

In IRRBB, we have long treated data as transactions, positions, market rates, behavioural assumptions, and risk outputs: the things our systems consume and the things they produce. That definition was not wrong, but it was built for the human consumer.

Treasury functions are being asked to work out where AI fits in the process. Most will struggle to do it well, not because the models are incapable, but because the underlying institutional context is not available in usable form.

This is an argument for widening the definition. Not just because “AI” is on everyone’s horizon, but because the data foundations in treasury risk have already become too narrow for the way the function actually works.

Why the old definition worked

The old definition worked because treasury data was read by a human analyst.

A good ALM analyst never relied only on the number in front of them. Before signing off an EVE or NII result, they also drew on the methodology behind it, the regulatory expectation it was calibrated to, the internal policy, the known limitations of the system, and the current state of the data feeding it.

Most of that context did not live in ALM systems. It lived in people’s heads and in fragmented documents: methodology papers, policy notes, committee minutes, Excel trackers, runbooks, email chains. The narrow definition of data worked because the human analyst could stitch all of that together, every time. Institutional memory was the compensating control.

What analysts use

If you watch a senior analyst sign their name to a number they may later have to defend, you can see the full stack they are drawing on. In practice, it is much wider than what treasury functions usually call “data.”

It includes at least these layers:

Regulation: the actual rule text, such as Basel IRRBB standards, PRA expectations, or EBA guidance, with changes tracked over time.
Interpretation: how the bank has chosen to read ambiguous language and turn it into implementation choices.
Internal policy: risk appetite, governance rules, scenario selections, review thresholds, approval requirements, and override boundaries.
Methodology: curve construction, behavioural modelling approaches, discounting choices, treatment of floors, optionality, and scope decisions.
Procedures: how the process is actually run, what checks are applied, and what happens when something fails.
Source data: positions, cashflows, rates, product attributes, and behavioural history.
Metrics: intermediate artefacts such as repricing gaps, adjusted cashflow ladders, and sensitivity components.
Outputs: EVE, NII, CSRBB, and the reported sensitivities by entity, currency, and scenario.
External context: rate environment, curve moves, positioning relative to peers.
Model and data limitations: known weaknesses, calibration boundaries, approximations, and what the framework does not capture.

That full stack is what professional judgment looks like in treasury risk. It is the minimum an experienced analyst weighs before they attest to a number leaving the building.

Even “data” is incomplete

Even the layers we already call data are not solved simply because they are structured.

Anyone who has run an IRRBB measurement process knows the standing inventory of issues: source-system misalignments, gaps caused by migrations or product changes, reconciliation breaks to finance, inconsistent market-rate sourcing, manual overlays, proxy treatments, and month-end position mismatches. Structured does not always mean reliable.

Around those issues sits another layer that is rarely captured in queryable form: quality thresholds, materiality judgements, compensating controls, remediation status, and override governance. That information lives in committee minutes, issue logs, change tickets, override registers, and team memory, outside any system that can be queried or inherited when someone leaves. A competent analyst does not just weigh the source data; they weigh the current quality state of that data and the controls wrapped around it.

This is bigger than AI

It is tempting to frame this gap only as an AI problem, but that understates it.

Every reporting cycle, someone in ALM or treasury risk signs off numbers that go to senior management, the board, the regulator, or an external disclosure. That act of attestation is where the missing context becomes operationally critical. Is the methodology current? Has the assumption been reviewed recently enough? Are there open data-quality issues against the feeds that undermine reliability? Which compensating controls are active? What known limitation would matter if the rate environment changed?

Most functions can answer those questions. The problem is the cost of answering them, assembled by hand, under deadline, every cycle. Supervisory review does not create that burden; it exposes it. So this is a supervision, resilience and key-person problem long before it is an AI problem. Regulators already expect robust systems, governance, assumptions, and measurement processes around IRRBB, which is exactly why relying on scattered institutional memory is becoming harder to defend.

This is why the definition of data has to widen. Regulation is data. Methodology is data. Active controls are data. So is policy interpretation, so is procedure, so are model limitations and current data-quality state. Anything the institution needs in order to reason responsibly about a treasury risk number belongs in the data foundation: versioned, traceable, linked to the rules it implements, linked to the outputs it governs, and available on demand to any legitimate consumer, human or machine.

Where AI fits

AI hasn’t created this gap; it has made one that was already there hard to ignore. The well-known issues in regulated environments remain — hallucination, context loss, auditability, traceability — and better context removes none of the need for human oversight, controls, or audit trails. But if a model can draw on methodology, policy interpretation, active controls, model limitations, and current data-quality state alongside the output itself, it has less need to infer what a number means and how far it can be trusted. That removes one major source of weak machine reasoning: unsupported guesswork dressed up in a plausible answer.

The same foundation matters well beyond generated commentary. Agentic workflows — drafting variance commentary, preparing a committee section, assembling an inspection response, producing a first-cut methodology change paper — all depend on having institutional content to work from. So does any internal model fine-tuned on bank-specific content, so that its outputs reflect this bank’s methodology, this bank’s policy interpretation, and this bank’s framing of disclosure rather than a generic one. Different consumers, same dependency. The claim is narrow: better context doesn’t make AI safe by default, but it makes the output more grounded and much easier for a human reviewer to challenge properly.

Where the real work is

The benefits don’t wait for AI. Sign-off cycles get shorter. Governance and committee packs stop being assembled from scratch and start being drawn from a single source of current, versioned content. Inspections get easier. New analysts reach productive depth faster. Key-person dependency falls. And people across the process stop repackaging the same institutional knowledge by hand, in slightly different forms, for every audience that asks for it.

So the most important work in front of treasury risk functions may not be adopting AI at all. It is building the data foundations that make governance visible, risk outputs defensible, and both human and machine judgment portable. What that buys back is something the function has quietly been short of for years: attention spent on insight, rather than on whether a number can be trusted or on grinding out the process that produced it. The AI conversation is downstream of that.

For a related view on why operational foundations deserve the weight currently given to modelling, see IRRBB Regulation: All Model, No Plumbing. For the practical limits of AI in treasury risk work, see LLMs: A Practical Guide for Banking Professionals.