Marketplace Trust and Safety Platform
Cut fraud losses by 70% without slowing down good users
A two-sided marketplace client
A two-sided marketplace was hemorrhaging money to fraud and content abuse, but every defensive measure they tried hurt good-user conversion more than it hurt the bad actors. The team had a rules engine in one place, an ML model in another, and a moderation queue in a third, with no shared signal between them. I unified the three into a single decisioning platform - rules, model, and human review - that scores every risky action in under 100ms and routes the borderline cases to humans with full context. Fraud loss dropped 70%, false-positive rate fell, and the moderation team finally stopped reviewing the same user three different ways.
This is a representative architecture study based on real project patterns. Specific metrics and client details have been generalized to protect confidentiality.
Results
What changed, in numbers
The metrics the engagement is measured by.
-70%
Fraud Loss
reduction in losses
-45%
False Positives
reduction on good users
<100ms
Decision Latency
p95 on hot path
+150%
Moderator Throughput
cases per hour
Challenge
What was broken
Fraud and abuse losses were growing faster than GMV. Rules were too blunt, the ML model was retrained quarterly and missed novel attacks, and the human moderators were drowning in context-free queues. Every team owned part of the problem; nobody owned the outcome.
Solution
The shape of the fix
A unified trust-and-safety platform with sub-100ms decisioning, a shared feature store, integrated human review, and a feedback loop that turns every moderator action into training data for the next model version.
Approach
How I tackled it
The concrete moves that took the project from broken to shipped.
Built a unified decisioning service that every risky action calls before it commits
Combined deterministic rules, an ML risk model, and human review in one decisioning graph instead of three siloes
Built a feature store with point-in-time correctness so the model trains on what production actually saw
Created a moderator UI with full account context so humans review users, not isolated actions
Closed the feedback loop: every moderator decision becomes labeled training data within 24 hours
Added shadow-mode evaluation so new rules and models could be measured against production traffic before going live
Outcomes
What shipped, and what it changed
Measured results from the engagement, told as a story rather than a scoreboard.
Reduced fraud and abuse losses by 70% within six months of launch
Cut false-positive rate on legitimate users by 45%, recovering measurable GMV
Sub-100ms p95 decisioning latency on the hot path so checkout flows weren't slowed
Reduced moderator review time per case by 60% via better context surfacing
Enabled shadow-mode rollout for new policies, killing the 'big bang policy disaster' risk
Stack
Technologies used
Linked entries open the technology page with related studies, playbooks, and notes.
Services
How I helped
The specific services involved in this engagement. Each links to a deeper breakdown.
Lessons
What I would tell the next team
The takeaways I carry into every similar engagement.
Trust and safety is a product problem dressed as an ML problem. The decisioning surface matters more than the model
A shared feature store is non-negotiable. Without it, online and offline metrics will not agree
Shadow mode is the difference between confident policy changes and outage-driving ones
Related
Other studies you might recognize
Engagements with overlapping problem shapes, industries, or stacks.
Have a similar challenge?
If any of this looks like the project on your desk, the conversation is the cheapest part. You can also browse other marketplace work or the full service list.