How I Cut Chargeback Fraud 73% on a Fintech Project

Three months on a fintech engagement, three rule-engine refactors, and a small AI scoring layer. Here's how the numbers actually moved.

Anonymized case study. Numbers are real, identifying details are not.

The starting state

A mid-stage fintech doing $80M GMV/year. Chargeback rate had crept from 0.6% to 1.4% in 18 months. They'd hit Visa's monitoring program threshold. Penalties imminent.

Existing fraud system: a 200-rule hand-tuned engine, false-positive rate around 8%, slow to update.

My engagement

Three-month fixed-bid engagement. Goal: cut chargeback rate below 1% without sacrificing approval rate.

What I changed

Phase 1 (weeks 1-3): instrumentation. The team didn't know which rules were firing, which were catching real fraud, which were rejecting good customers. I built a dashboard tracking every rule's true-positive and false-positive rates over time. Discovered 40 rules fired but had near-zero true positives - pure noise.

Phase 2 (weeks 4-7): rule cleanup. Disabled the noise rules, tightened the high-value ones with better thresholds. False-positive rate dropped from 8% to 5.5%. No effect yet on chargeback rate.

Phase 3 (weeks 8-12): scoring layer. Built a gradient-boosted model on 18 months of historical chargeback data. 47 features, mostly velocity/network/device. Output: a 0-1 risk score.

The score didn't replace rules - it augmented them. Rules still ran first. Borderline rule outcomes (ambiguous) were broken by the model.

Final numbers

Chargeback rate: 1.4% → 0.38% (73% reduction)
False positive rate: 8% → 4.2% (47% reduction - yes, both improved)
Approval rate on legit traffic: +2.1%

What I'd do differently

I overinvested in feature engineering on the model. The simpler version (with 12 features instead of 47) was within 4% of the final model. Diminishing returns.

I'd also have shipped a "shadow mode" earlier - the model running but not blocking - to validate it against production traffic before flipping the switch.

Lessons for fintech teams

Instrumentation first. You can't improve what you can't measure. 80% of "we need ML" problems are actually "we don't measure our existing rules" problems.
Don't replace rules - augment them. Rules are auditable, ML is not. Regulators love rules. Use the model to break ties.
Watch approval rate. It's easy to cut chargebacks by being overly restrictive. The right metric is profit-per-attempted-transaction, not chargeback rate alone.

This is the kind of work I do on engagements. If your fraud numbers look like the starting state above, let's talk.

How I Cut Chargeback Fraud 73% on a Fintech Project

The starting state

My engagement

What I changed

Final numbers

What I'd do differently

Lessons for fintech teams

References

Related Articles

Building Production RAG Systems

An LLM Evaluation Framework That Works

Prompt Engineering for Production

Want to discuss this topic?

How I Cut Chargeback Fraud 73% on a Fintech Project

The starting state

My engagement

What I changed

Final numbers

What I'd do differently

Lessons for fintech teams

References

Related Articles

Building Production RAG Systems

An LLM Evaluation Framework That Works

Prompt Engineering for Production

Want to discuss this topic?

Command Palette