Adversarial attacks against machine-learning systems in financial services
Financial institutions are deploying ML systems into the parts of their stack that matter most: fraud detection, anti-money-laundering screening, credit decisioning, transaction monitoring, and customer-service LLMs. These systems carry an attack surface that traditional cyber-security controls do not address. This paper defines six adversarial attack classes specific to financial-sector ML, maps them to malicious-actor goals, and proposes a starting threat model for the institutions we work with.
- Published
- 2026-04-29
- Status
- PUBLISHED
- Version
- 1.0
- Author
- IMF Research — Detection engineering practice
- Reading time
- 11 min
Contents
1. Introducing adversarial attacks against financial-services ML
Machine-learning systems are now embedded in the parts of a financial institution’s stack where decisions are most consequential: real-time fraud screening, AML and sanctions match scoring, credit and pricing models, transaction-monitoring rule augmentation, and an expanding surface of generative-AI assistants used for customer service, document review, and policy interpretation.
Each of these systems carries an attack surface that is not addressed by the institution’s existing cyber-security controls. The model itself, the data that trained it, the pipeline that deployed it, and the inputs it processes are all targets for a deliberate adversary with goals that are specifically financial — direct fraud loss, AML evasion, market manipulation, or extraction of customer data.
This paper does three things:
- It defines six classes of adversarial machine-learning (AML) attacks with concrete relevance to financial-sector deployments, drawing on the NCSC and NIST taxonomies and adapting them to the threat model our clients actually face.
- It maps each class to the malicious-actor goals we observe in the field, so that risk owners and detection engineers can prioritise which classes matter for their specific deployments.
- It points to the public guidance and research that institutions should fold into their threat-modelling practice, and identifies the gaps where defensive research is currently thin.
This paper does not attempt to enumerate defences against each attack class in detail. Defensive technique is heavily product- and deployment-dependent, and we treat it as engagement work rather than publication work.
1.1 Aims and audience
This paper is written for the people who own the question of “how might an adversary attack our model” inside a regulated financial institution. That includes detection engineers, model-risk-management teams, second-line model validators, and the security architects who sign off on ML system deployments. It is not written for ML researchers; the technical depth is calibrated to a security practitioner who is broadly literate about ML but does not work on model internals daily.
The paper assumes baseline knowledge of supervised-learning concepts (training data, model weights, inference) and of standard cyber-security controls (network segmentation, secrets management, SDLC). Where domain-specific terms are used we define them in the glossary in §1.3.
1.2 Why financial ML is a distinct attack surface
Three properties of financial-services ML systems make them a notable target distinct from general-purpose ML:
- The decision is the value. Unlike content moderation or search ranking, the output of a fraud-detection or credit-decisioning model directly governs whether money moves. A successful attack on the model is a successful attack on the institution’s losses or its regulatory posture, with no further exploitation required.
- The training data is sensitive in itself. Model inversion against an AML model that was trained on suspicious-activity reports may leak the existence of those reports — which carries regulatory and national-security implications independent of any direct financial loss.
- The adversary iterates in production. Fraud rings test detection models in real time, observing which patterns get challenged. The adversarial loop is much tighter than in most ML deployments and rewards adaptive evasion.
1.3 Glossary
For brevity throughout the body of the paper, we use the following terms with the meanings stated.
- Adversarial machine learning (AML) attack — an attack that exploits a vulnerability of an ML system, distinct from a classical cyber-attack on the supporting IT infrastructure. (Note: in the rest of the document, “AML” used unqualified means anti-money- laundering, which is the more common financial-sector usage. Where we mean “adversarial machine learning” we say so explicitly.)
- Model — the trained mathematical artefact that produces predictions or scores. Includes weights, architecture, and any pre- or post-processing transforms shipped with it.
- Inference — the process of producing a prediction from an input. The endpoint at which inputs are accepted and outputs are returned is the inference surface.
- Surrogate model — a model trained by the adversary to approximate the behaviour of the target model, typically for use in attack-planning.
- Sanctions/AML model — any production ML system whose primary purpose is to screen transactions or relationships against regulatory or financial-crime risk.
- Transaction monitoring — the rule- or model-driven layer that flags individual or aggregate financial movement for human review.
2. Defining AML attack classes for financial ML
We define six attack classes below. They are not strictly disjoint; some real attacks combine techniques from more than one. The classes exist to give threat-modelling conversations a shared vocabulary, not to be a clean partition of the space.
2.1 Model characterisation against fraud-detection systems
The adversary submits inputs to the deployed fraud-detection model and observes the outcomes (challenged, allowed, escalated). Over many queries — sometimes through a fraudulent merchant account that the adversary controls, sometimes through cards-on-file at compromised retailers — the adversary builds a surrogate model that approximates the institution’s decision boundary.
This is reconnaissance, not attack-in-itself, but it is the precondition for most other attacks below. We assess with high confidence that characterisation is currently underway against most major fraud platforms; the hardness of the model is the rate at which it shifts its decision boundary relative to the rate at which the adversary can re-characterise it.
The defensive question is not “can we prevent characterisation” (usually no) but “can we shift the decision boundary faster than it can be re-characterised” (sometimes yes).
2.2 Training-data poisoning of AML/KYC pipelines
AML and KYC systems often consume training labels generated by analyst review — a transaction that was flagged, reviewed, and cleared becomes a “negative” example in subsequent retraining. Adversaries who can either compromise the analyst-review queue or generate enough synthetic legitimate-looking activity can corrupt the training set so that future models learn to ignore the patterns the adversary cares about.
The form we observe most often is not deliberate label-poisoning by an insider, but rather a slow drift in which a fraud ring’s behaviour generates many cleared cases over a period of months — because each individual case is just below the threshold that triggers a SAR — and the model that retrains on those cases learns to be more permissive of the underlying pattern.
The boundary between “adversarial poisoning” and “concept drift the adversary is exploiting” is fuzzy, and the operational fix is similar in both cases: retraining sets must be sampled with a sense of what the drift could look like, not just what the historical labels say.
2.3 Adversarial inputs against transaction monitoring
The transaction-monitoring layer at most institutions is a hybrid: explicit rules supplemented by a model that scores aggregate behaviour across windows. The adversary’s input-manipulation surface here is the structured-transaction surface itself — amounts, counterparties, timing, geography, channel.
Concrete attack patterns we have observed in client engagements:
- Threshold-aware structuring. Transactions arranged to land just below the per-day, per-week, and per-month thresholds the rules-layer is known to apply. The model layer then sees only individually-unremarkable activity.
- Counterparty-pattern minimisation. Splitting flows across many counterparty accounts in a way that defeats the graph-based features the monitoring model relies on for clustering.
- Adversarial perturbation of document-verification inputs. For ML-driven document-verification (driver’s licence, passport, utility bill), small perturbations to the image — adversarial patches imperceptible to a human reviewer — can flip the verifier’s decision. We have observed this in production against at least two cloud-hosted KYC providers.
2.4 Model inversion against credit-decision engines
Credit-decisioning models trained on borrower data are valuable targets because their training data is intrinsically sensitive (PII, income, employment) and because the regulatory regime around credit decisions is strict.
Two model-inversion patterns are realistic against a deployed credit-decisioning API:
- Membership inference — given a candidate identity and credit profile, the adversary determines whether that identity was in the model’s training set, simply by observing how confidently the model scores it. This breaches GDPR / regulatory expectations about the privacy of training data.
- Reconstruction by query — the adversary submits many synthetic borrower profiles and observes the model’s decisions, which leak information about the threshold structure and the feature importances. With enough queries, an adversary can approximate the training distribution closely enough to be commercially valuable (for example, to a competitor).
The defensive position is rate-limiting on the inference surface combined with monitoring for query patterns that look like characterisation. Both are weakly deployed in our experience.
2.5 Prompt injection against financial-services LLMs
Generative-AI assistants are now in production at most large institutions for customer service, document analysis, and internal policy interpretation. The attack surface is large and the operational controls are immature.
The classes of attack we expect to land first in financial-services deployments:
- Direct prompt injection to the customer-service LLM — adversary-controlled text included in a customer query causes the model to reveal information from its system prompt, including internal guidance documents that institutions did not intend to expose.
- Indirect prompt injection through documents — a malicious invoice or contract uploaded for AI-assisted review carries hidden instructions that cause the assistant to misclassify the document or to recommend a different action than its training would otherwise yield.
- Jailbreak-driven exfiltration — getting the assistant to output information it has access to but is not supposed to share, for instance customer balances accessible to the underlying tools but gated behind policy guidance.
2.6 Model artefact manipulation in deployment pipelines
A trained model is a binary artefact (weights, architecture, optional tokenizer). The pipeline that builds, signs, registers, and deploys that artefact is the same kind of supply chain we have learned to worry about for software: a compromised CI/CD step that swaps weights for a subtly-modified version is functionally equivalent to a classical software supply-chain attack, with the difference that post-deployment behavioural monitoring of an ML system is much weaker than it is for software.
The institution that does not sign its model artefacts and verify the signatures at load time is not catching this attack class.
3. Mapping attack classes to malicious-actor goals
Different adversaries target different combinations of these classes. The matrix below is our current view of which class is load-bearing for which goal in the financial-sector engagements we run.
| Goal \ Class | Characterisation (2.1) | Poisoning (2.2) | Adversarial input (2.3) | Inversion (2.4) | Prompt injection (2.5) | Artefact manipulation (2.6) |
|---|---|---|---|---|---|---|
| Direct fraud profit | + | + | + | — | + | — |
| AML evasion | + | + | + | — | — | — |
| Customer-data theft | — | — | — | + | + | + |
| Regulatory-breach engineering | — | + | — | + | + | + |
| Competitive intelligence | + | — | — | + | — | — |
| Market manipulation | + | — | + | — | — | + |
| Reputational damage | — | + | + | — | + | + |
The matrix is descriptive, not prescriptive. Each cell deserves a fuller treatment than this paper provides, and the institution- specific weighting will depend heavily on which models are in production and where they sit in the value chain.
- Class 2.1
- Detection of model characterisation in production with low false-positive rate
- Class 2.5
- Tool-isolation patterns for LLM deployments without breaking utility
- Class 2.6
- Signed-artefact loading for live ML inference at low latency
- All
- Cross-institution sharing of observed adversarial telemetry
4. What financial institutions should do
The institutions we have worked with that do this well are characterised less by any specific tool than by three operational practices.
- Threat-model the model. Add an explicit step to the model-deployment lifecycle in which the model owner, the detection-engineering function, and the second-line validator enumerate which of the six classes above apply, and what the observable signature of an attack would be. The output is one page attached to the model risk record. The enumeration alone catches most of the gaps; the page is the artefact a regulator can read.
- Instrument the inference surface. Treat the inference surface as a first-class telemetry source, with the same retention and alerting maturity as the authentication or payment surfaces. Per- caller query rates, distributional anomalies in inputs, response- caching exploitation patterns. Most institutions have some of this for some models and none of it for others.
- Sign and verify model artefacts. Apply the SBOM/signing practices used for software to model artefacts. Trained-weights files are binary blobs; treat them as such for supply-chain hygiene.
For further reading we point to NIST AI 100-2 (Adversarial Machine Learning: A Taxonomy and Terminology), the MITRE ATLAS knowledge base, the UK government’s AI Cyber Security Code of Practice, and the EBA Guidelines on the use of ML in IRB credit models. These are the frameworks regulators will reference; they are also the most useful starting points for a threat-modelling practice that needs to be defensible.
We expect to publish a follow-up paper at the end of 2026 with specific defensive engineering patterns for classes 2.5 and 2.6, informed by client engagements through the year.
Also see
- PAPER · 2026-04-22
Could your choice of payment-fraud telemetry be harming your detection?
Most fraud-detection programmes report on the wrong things. Four metrics that harm detection, and four replacements.
- SERVICE
Detection engineering
Detections written to your stack, tuned against your telemetry, validated with adversary emulation.
- SERVICE
Security engineering
Architecture review, IaC contributions, and infrastructure hardening for the systems your customers depend on.
Was this paper useful?
Thank you. Feedback is recorded locally in your browser only — we do not transmit it. To send substantive feedback, email support@imfamericas.com.