Marketplace5 monthsLead with 3 engineers and 1 data scientist

Marketplace Trust and Safety Platform

Cut fraud losses by 70% without slowing down good users

A two-sided marketplace client

A two-sided marketplace was hemorrhaging money to fraud and content abuse, but every defensive measure they tried hurt good-user conversion more than it hurt the bad actors. The team had a rules engine in one place, an ML model in another, and a moderation queue in a third, with no shared signal between them. I unified the three into a single decisioning platform - rules, model, and human review - that scores every risky action in under 100ms and routes the borderline cases to humans with full context. Fraud loss dropped 70%, false-positive rate fell, and the moderation team finally stopped reviewing the same user three different ways.

This is a representative architecture study based on real project patterns. Specific metrics and client details have been generalized to protect confidentiality.

Results

What changed, in numbers

The metrics the engagement is measured by.

-70%

Fraud Loss

reduction in losses

-45%

False Positives

reduction on good users

<100ms

Decision Latency

p95 on hot path

+150%

Moderator Throughput

cases per hour

Challenge

What was broken

Fraud and abuse losses were growing faster than GMV. Rules were too blunt, the ML model was retrained quarterly and missed novel attacks, and the human moderators were drowning in context-free queues. Every team owned part of the problem; nobody owned the outcome.

Solution

The shape of the fix

A unified trust-and-safety platform with sub-100ms decisioning, a shared feature store, integrated human review, and a feedback loop that turns every moderator action into training data for the next model version.

Approach

How I tackled it

The concrete moves that took the project from broken to shipped.

1

Built a unified decisioning service that every risky action calls before it commits

2

Combined deterministic rules, an ML risk model, and human review in one decisioning graph instead of three siloes

3

Built a feature store with point-in-time correctness so the model trains on what production actually saw

4

Created a moderator UI with full account context so humans review users, not isolated actions

5

Closed the feedback loop: every moderator decision becomes labeled training data within 24 hours

6

Added shadow-mode evaluation so new rules and models could be measured against production traffic before going live

Outcomes

What shipped, and what it changed

Measured results from the engagement, told as a story rather than a scoreboard.

  • Reduced fraud and abuse losses by 70% within six months of launch

  • Cut false-positive rate on legitimate users by 45%, recovering measurable GMV

  • Sub-100ms p95 decisioning latency on the hot path so checkout flows weren't slowed

  • Reduced moderator review time per case by 60% via better context surfacing

  • Enabled shadow-mode rollout for new policies, killing the 'big bang policy disaster' risk

Stack

Technologies used

Linked entries open the technology page with related studies, playbooks, and notes.

Services

How I helped

The specific services involved in this engagement. Each links to a deeper breakdown.

Lessons

What I would tell the next team

The takeaways I carry into every similar engagement.

Trust and safety is a product problem dressed as an ML problem. The decisioning surface matters more than the model

A shared feature store is non-negotiable. Without it, online and offline metrics will not agree

Shadow mode is the difference between confident policy changes and outage-driving ones

More patterns and playbooks live in Insights.

Have a similar challenge?

If any of this looks like the project on your desk, the conversation is the cheapest part. You can also browse other marketplace work or the full service list.

Command Palette

Search for a command to run...