All Insights
guides

Instrumentation Before Optimization

If you can't measure it, you can't fix it. Most teams skip this step.

October 22, 20247 min read

Half the projects I get pulled into are 'we need to optimize X.' Half of those don't actually have data showing X is the bottleneck. Instrumentation is the first move.

When a client says "we need to optimize the database / the front-end / the AI pipeline," my first question is: what does your data say?

About half the time, they don't have data. They have opinions. Opinions are how you optimize the wrong thing.

The instrumentation-before-optimization rule

Before any optimization work, I install:

  1. Distributed tracing (OTel everywhere)
  2. RED metrics on every endpoint (rate, errors, duration)
  3. Database query logging at the slow-query threshold (Postgres pg_stat_statements is gold)
  4. Browser performance metrics (Web Vitals + custom marks)
  5. A 7-day baseline of all of the above

THEN we look at where the time actually goes.

What I usually find

In rough order of frequency:

  1. N+1 queries. A page makes 1 query, then 50 more inside a loop. Fixing this often delivers 10x without any architectural change.
  2. Synchronous external API calls in serial. A page calls 5 services in sequence; should be parallel.
  3. Unnecessary data being fetched. Pulling 500 columns when the page uses 12.
  4. Front-end JS bundle size. Bundle analyzer shows 600K minified for a page that needs maybe 80K.
  5. Algorithm inside a hot path. Less common but high-impact when found.

Notice "we need a more powerful database" or "we should rewrite in Rust" doesn't appear. They almost never do.

A real example

A client told me "Postgres is the bottleneck, we need to migrate to ScyllaDB." I asked for the data. They didn't have any.

We instrumented for two weeks. Postgres CPU sat at 30%. The bottleneck was the application layer doing N+1 queries against a small table. We fixed the N+1 in a day. Latency dropped 60%. The Postgres "migration" project never happened.

That conversation saved 6 months of engineer time and a six-figure migration risk.

What instrumentation costs

For most teams: 2-3 weeks of focused work to get good observability across the stack. After that, it's marginal.

For teams that already have observability but treat it as alarm-only: a week of building dashboards that answer the questions you actually have.

The discipline

Teams that succeed at performance work treat observability as a deliverable. They have a "before" snapshot and an "after" snapshot for every optimization. They share the comparison.

Teams that don't succeed at performance work treat observability as a thing they'll add later. They never do.

References

performanceobservabilityengineering

Want to discuss this topic?

I'm always happy to dive deeper. Reach out if you have questions or want to collaborate.

Get in Touch

Command Palette

Search for a command to run...