Instrumentation Before Optimization
If you can't measure it, you can't fix it. Most teams skip this step.
Half the projects I get pulled into are 'we need to optimize X.' Half of those don't actually have data showing X is the bottleneck. Instrumentation is the first move.
When a client says "we need to optimize the database / the front-end / the AI pipeline," my first question is: what does your data say?
About half the time, they don't have data. They have opinions. Opinions are how you optimize the wrong thing.
The instrumentation-before-optimization rule
Before any optimization work, I install:
- Distributed tracing (OTel everywhere)
- RED metrics on every endpoint (rate, errors, duration)
- Database query logging at the slow-query threshold (Postgres pg_stat_statements is gold)
- Browser performance metrics (Web Vitals + custom marks)
- A 7-day baseline of all of the above
THEN we look at where the time actually goes.
What I usually find
In rough order of frequency:
- N+1 queries. A page makes 1 query, then 50 more inside a loop. Fixing this often delivers 10x without any architectural change.
- Synchronous external API calls in serial. A page calls 5 services in sequence; should be parallel.
- Unnecessary data being fetched. Pulling 500 columns when the page uses 12.
- Front-end JS bundle size. Bundle analyzer shows 600K minified for a page that needs maybe 80K.
- Algorithm inside a hot path. Less common but high-impact when found.
Notice "we need a more powerful database" or "we should rewrite in Rust" doesn't appear. They almost never do.
A real example
A client told me "Postgres is the bottleneck, we need to migrate to ScyllaDB." I asked for the data. They didn't have any.
We instrumented for two weeks. Postgres CPU sat at 30%. The bottleneck was the application layer doing N+1 queries against a small table. We fixed the N+1 in a day. Latency dropped 60%. The Postgres "migration" project never happened.
That conversation saved 6 months of engineer time and a six-figure migration risk.
What instrumentation costs
For most teams: 2-3 weeks of focused work to get good observability across the stack. After that, it's marginal.
For teams that already have observability but treat it as alarm-only: a week of building dashboards that answer the questions you actually have.
The discipline
Teams that succeed at performance work treat observability as a deliverable. They have a "before" snapshot and an "after" snapshot for every optimization. They share the comparison.
Teams that don't succeed at performance work treat observability as a thing they'll add later. They never do.