All Insights
technical

How I Cut Chargeback Fraud 73% on a Fintech Project

An anonymized case study with the real numbers.

August 30, 202511 min read

Three months on a fintech engagement, three rule-engine refactors, and a small AI scoring layer. Here's how the numbers actually moved.

Anonymized case study. Numbers are real, identifying details are not.

The starting state

A mid-stage fintech doing $80M GMV/year. Chargeback rate had crept from 0.6% to 1.4% in 18 months. They'd hit Visa's monitoring program threshold. Penalties imminent.

Existing fraud system: a 200-rule hand-tuned engine, false-positive rate around 8%, slow to update.

My engagement

Three-month fixed-bid engagement. Goal: cut chargeback rate below 1% without sacrificing approval rate.

What I changed

Phase 1 (weeks 1-3): instrumentation. The team didn't know which rules were firing, which were catching real fraud, which were rejecting good customers. I built a dashboard tracking every rule's true-positive and false-positive rates over time. Discovered 40 rules fired but had near-zero true positives - pure noise.

Phase 2 (weeks 4-7): rule cleanup. Disabled the noise rules, tightened the high-value ones with better thresholds. False-positive rate dropped from 8% to 5.5%. No effect yet on chargeback rate.

Phase 3 (weeks 8-12): scoring layer. Built a gradient-boosted model on 18 months of historical chargeback data. 47 features, mostly velocity/network/device. Output: a 0-1 risk score.

The score didn't replace rules - it augmented them. Rules still ran first. Borderline rule outcomes (ambiguous) were broken by the model.

Final numbers

  • Chargeback rate: 1.4% → 0.38% (73% reduction)
  • False positive rate: 8% → 4.2% (47% reduction - yes, both improved)
  • Approval rate on legit traffic: +2.1%

What I'd do differently

I overinvested in feature engineering on the model. The simpler version (with 12 features instead of 47) was within 4% of the final model. Diminishing returns.

I'd also have shipped a "shadow mode" earlier - the model running but not blocking - to validate it against production traffic before flipping the switch.

Lessons for fintech teams

  1. Instrumentation first. You can't improve what you can't measure. 80% of "we need ML" problems are actually "we don't measure our existing rules" problems.
  2. Don't replace rules - augment them. Rules are auditable, ML is not. Regulators love rules. Use the model to break ties.
  3. Watch approval rate. It's easy to cut chargebacks by being overly restrictive. The right metric is profit-per-attempted-transaction, not chargeback rate alone.

This is the kind of work I do on engagements. If your fraud numbers look like the starting state above, let's talk.

References

fintechfraudaicase-study

Want to discuss this topic?

I'm always happy to dive deeper. Reach out if you have questions or want to collaborate.

Get in Touch

Command Palette

Search for a command to run...