When Bad Data Makes Fraud Models Look Busy

Many fraud and financial crime teams are not overwhelmed because criminals have suddenly become more active. They are overwhelmed because their models and monitoring rules are being fed weak transaction and reference data.

This is the uncomfortable reality behind much banking fraud analytics data quality South Africa work: the fraud engine may be functioning as designed, but the data it receives is incomplete, inconsistent, late, duplicated, or poorly defined. The result is alert fatigue. Analysts spend their day clearing noise. Genuine risk is harder to see. Customers are inconvenienced by unnecessary holds. Executives are told the model needs to be “improved”, when the more urgent problem sits upstream.

For heads of fraud operations and the CRO office, the diagnostic question is not only “Is the model accurate?” It is: “Can we trust the data attributes that drive the alert?”

Alert fatigue is not always a modelling failure

Fraud scoring and AML transaction monitoring systems depend on signals. These signals may include transaction amount, merchant type, location, device, beneficiary history, account age, customer segment, product type, channel, time of day, and prior behaviour.

If those signals are wrong, the output will be noisy.

Consider a bank monitoring digital payments. A customer who usually pays local beneficiaries suddenly sends a large payment to a new recipient. That may be a valid risk signal. But if the beneficiary reference file is stale, the system may not recognise that the customer has paid the same person before through another channel. A normal payment then looks unusual.

In AML monitoring, the same issue appears in a different form. A rule may flag cash deposits just below a reporting threshold. If branch codes, depositor identifiers, or account ownership attributes are inconsistent, the monitoring system may group unrelated activity together or split related activity apart. Both outcomes damage the quality of alerts.

Fraud teams often respond by tuning rules, raising thresholds, or adding manual review capacity. Those actions may be necessary, but they do not solve the underlying defect if the data remains unreliable.

Where transaction data breaks before it reaches monitoring

Transaction data is created across many systems: card platforms, digital banking, branch systems, ATM networks, payment rails, call centres, and back-office processes. Each system may record the same event differently.

A card transaction may carry a merchant category code, terminal location, acquiring bank identifier, authorisation response, and timestamp. An electronic funds transfer may include beneficiary details, account number, reference text, channel, device identifier, and payment instruction time. A cash deposit may contain branch, teller, depositor, account, and source-of-funds information.

Problems start when these fields are not captured consistently or are transformed without clear rules.

Examples include:

timestamps recorded in different time zones or processing stages;
merchant names truncated or normalised inconsistently;
device identifiers missing after mobile app upgrades;
beneficiary names entered as free text with no matching standard;
channel codes changing after system migrations;
reversals and retries treated as separate transactions;
dormant or closed account statuses not synchronised across platforms.

None of these issues is exotic. They are common in South African banks with legacy platforms, outsourced processing arrangements, multiple product systems, and ongoing digital channel growth. Load-shedding and connectivity interruptions can add further complications where transactions are delayed, batched, retried, or processed out of sequence.

A fraud model does not understand organisational complexity. It only sees attributes. If the attributes are unstable, the model learns instability.

Reference data is the quiet source of many false positives

Transaction data gets most of the attention because it is visible and high-volume. Reference data is less visible, but often more damaging.

Reference data includes customer master data, product hierarchies, branch lists, merchant classifications, employer records, account relationships, risk ratings, geographic mappings, politically exposed person indicators, sanctions screening inputs, and known fraud typologies.

In practice, these data sets are often owned by different teams. Fraud operations may depend on reference data maintained by product, onboarding, compliance, finance, channels, or external service providers. When ownership is unclear, quality deteriorates.

For example, a business banking customer may be classified as a small enterprise in one system and as a high-turnover commercial client in another. The transaction monitoring rules applied to that customer may therefore be inappropriate. A cash-intensive business such as a filling station, wholesaler, or hospitality operator may generate alerts because the monitoring system does not hold an accurate view of expected activity.

Similarly, merchant reference data can distort card fraud detection. If a legitimate local merchant is misclassified as a high-risk category, ordinary customer spend may be scored aggressively. If a risky merchant is classified too broadly, suspicious transactions may blend into normal activity.

This is why AML transaction monitoring false positives data attributes should be reviewed together. It is not enough to ask how many alerts were generated. The better question is which attributes contributed to the alerts, and whether those attributes were correct at the time of scoring.

The operational cost is hidden in analyst behaviour

Alert fatigue is usually measured in volumes: number of alerts, number closed, backlog, ageing, and escalation rate. Those measures are useful, but they do not reveal the full cost.

The deeper risk appears in analyst behaviour.

When analysts see too many weak alerts, they adapt. They develop shortcuts. They learn which alerts are “usually nothing”. They may rely on narrative templates because investigation time is limited. Quality assurance may then focus on whether the case file is complete rather than whether the alert was meaningful.

This creates a dangerous cycle. Low-quality alerts consume capacity, reducing the time available for complex cases. Complex cases receive less attention, increasing the chance that genuine fraud, mule activity, account takeover, or laundering patterns are missed.

There is also a customer impact. A retail banking customer whose card is repeatedly blocked while travelling between Johannesburg, Durban, and Cape Town will lose trust in the bank’s controls. A small business whose supplier payment is delayed at month-end may suffer real operational harm. Under Treating Customers Fairly expectations, this is not only an efficiency concern. Unnecessary friction caused by poor data can become a conduct issue.

For the CRO office, the point is clear: false positives are not harmless. They consume scarce risk capacity and can create customer harm.

A practical diagnostic for fraud and risk leaders

A useful diagnostic should avoid starting with the model. Start with the alert population and work backwards.

First, select a sample of recent alerts across fraud scoring and transaction monitoring. Include alerts closed as false positives, alerts escalated for investigation, customer-impacting alerts, and high-value alerts. Do not only review the easy cases.

Second, identify the top contributing attributes. For each alert type, determine which fields drove the score or rule trigger. This may include amount, velocity, location, beneficiary novelty, device change, customer risk rating, merchant category, occupation, business type, or expected turnover.

Third, test the data lineage. Ask where each field originated, how it was transformed, when it was updated, and which team owns it. If nobody can explain this clearly, the field is not fit to drive high-impact decisions without controls.

Fourth, compare operational reality with system representation. For example, if a customer’s business profile says “general retail” but account activity shows payroll, supplier payments, cash deposits, and card acquiring settlement, the profile may be too generic for AML monitoring. If a mobile device identifier resets after every app reinstall, it may be a weak indicator for account takeover unless treated carefully.

Fifth, quantify avoidable noise. Estimate how many alerts would not have fired if specific data defects were corrected. This gives executives a business case that is more concrete than “improve data quality”.

This diagnostic can sit alongside broader banking data strategy work. For related examples, see Zorinthia’s banking advisory section at examples for banking and the dedicated discussion on fraud and financial crime data.

POPIA and governance are part of the solution, not a blocker

Fraud and AML teams sometimes treat privacy, compliance, and data governance as constraints that slow them down. In reality, good governance helps reduce noise.

POPIA requires responsible handling of personal information, including purpose limitation, security safeguards, and attention to information quality. In fraud analytics, this means institutions should know why a data attribute is used, whether it is accurate enough for that purpose, who can access it, and how long it is retained.

This matters where sensitive or high-impact decisions are made. If a transaction is blocked, an account is restricted, or a customer is escalated for investigation, the institution should be able to explain the decision internally. That explanation does not require disclosing detection logic to criminals. It does require a defensible record of the data, rule, score, and human decision path.

SARB prudential expectations, FSCA conduct oversight, and FIC Act obligations all reinforce the need for disciplined controls. The bank must detect financial crime, but it must also manage operational resilience, customer fairness, and regulatory accountability.

A practical governance model should assign ownership for critical fraud and AML data attributes. It should define acceptable quality thresholds, exception handling, change approval, and periodic review. If the merchant category hierarchy changes, fraud operations should know before alert volumes spike. If onboarding fields are redesigned, AML monitoring should assess the impact before production.

What good improvement looks like

The goal is not zero false positives. That would be unrealistic and unsafe. The goal is better signal quality: fewer weak alerts, stronger prioritisation, and clearer accountability when data defects affect decisions.

A well-run improvement programme will show evidence in several ways.

Alert volumes may fall in specific categories without reducing detection of confirmed fraud or reportable suspicious activity. Analyst productivity should improve because cases contain more reliable context. Quality assurance findings should shift from missing documentation to better investigative reasoning. Customer complaints linked to unnecessary blocks or delays should reduce. Model performance reporting should separate data-quality issues from model-design issues.

Executives should also expect a more honest view of analytics maturity. If the institution cannot define core fields consistently across channels, it is premature to expect advanced fraud models to perform reliably at scale. More data science will not compensate for weak source data, unclear ownership, or poor change control.

In South African banking, this is particularly important because institutions are balancing cost pressure, digital fraud growth, regulatory scrutiny, and customer expectations. Fraud teams cannot simply add more analysts every time alert volumes rise. The more sustainable approach is to remove avoidable noise at source.

The next executive question

For the head of fraud operations or the CRO office, the next step is a focused review of the alerts that waste the most time and create the most customer friction.

Ask this question at the next risk or financial crime forum:

Which five data attributes are responsible for the highest volume of low-quality fraud or AML alerts, and who owns the accuracy of each one?

If that question cannot be answered with evidence, the institution does not yet have a model problem. It has a data accountability problem that is making the model look busy.

Example Scenario

When Bad Data Makes Fraud Models Look Busy

When Bad Data Makes Fraud Models Look Busy

Alert fatigue is not always a modelling failure

Where transaction data breaks before it reaches monitoring

Reference data is the quiet source of many false positives

The operational cost is hidden in analyst behaviour

A practical diagnostic for fraud and risk leaders

POPIA and governance are part of the solution, not a blocker

What good improvement looks like

The next executive question

Related Articles

Gautrain Expansion and the Strategic Role of Data as an A...

What Capitec Pulse Reveals About Data Readiness

The Budget Question Is Fading. The Governance Question Is...

Coffee Roaster Analytics Using Production and Sales Data ...

Example Scenario

When Bad Data Makes Fraud Models Look Busy

When Bad Data Makes Fraud Models Look Busy

Alert fatigue is not always a modelling failure

Where transaction data breaks before it reaches monitoring

Reference data is the quiet source of many false positives

The operational cost is hidden in analyst behaviour

A practical diagnostic for fraud and risk leaders

POPIA and governance are part of the solution, not a blocker

What good improvement looks like

The next executive question

More Data Strategy Articles

Gautrain Expansion and the Strategic Role of Data as an Asset

What Capitec Pulse Reveals About Data Readiness

The Budget Question Is Fading. The Governance Question Is Rising.

Related Articles

Gautrain Expansion and the Strategic Role of Data as an A...

What Capitec Pulse Reveals About Data Readiness

The Budget Question Is Fading. The Governance Question Is...

Coffee Roaster Analytics Using Production and Sales Data ...