The Observability Stack I Run in 2025
Datadog is great. So is the bill. Here's what I actually run on small teams.
For early-stage companies, modern observability is over-priced for what most teams need. Here's a leaner stack that gets 90% of the value at 20% of the cost.
Datadog will charge you $20K/month before you blink. For a 5-person startup, that's a hire. Here's what I run instead.
The minimum viable observability stack
- OpenTelemetry SDK in all services. This is non-negotiable. The vendor-neutral instrumentation means you can swap backends.
- Grafana Cloud free tier OR self-hosted Grafana + Loki + Tempo + Mimir on a single box.
- Sentry for application errors. The free tier covers small teams.
- Better Stack for uptime + status page. Ten bucks a month.
That stack handles:
- Distributed tracing (Tempo via OTel)
- Logs (Loki via OTel)
- Metrics (Mimir/Prometheus via OTel)
- Errors (Sentry)
- Uptime (Better Stack)
Total monthly cost for a small team: under $200.
When to upgrade
Migrate to Datadog or Honeycomb when:
- Volume exceeds free-tier limits (you'll know)
- You need APM features beyond what OSS offers (rare, but real for some workloads)
- You have a dedicated platform team and the time savings justify the cost
My standard service template
Every Spring Boot service I ship now starts with:
- OTel auto-instrumentation
- Structured JSON logging with correlation IDs
- RED metrics (rate, errors, duration) for every endpoint
- Health and readiness endpoints
/metricsPrometheus scrape endpoint- Sentry for unhandled exceptions
This costs maybe 4 hours to set up the first time. After that, it's a template.
What I look at every Monday
- p95 / p99 latency by endpoint, week over week
- Error rate trend
- Active alerts that fired in the last 7 days
- Top 5 slow queries (Postgres pg_stat_statements)
15 minutes. Catches 80% of issues before customers do.