01 — Purpose

See production behaviour clearly

Observability turns errors, slow pages, and regressions into signals teams act on — in production, not only in staging.

Shipping code is not the finish line. Problems happen on real devices, networks, and sessions. Instrumentation earns its keep when it changes what you ship next.

Instrumentation alone is not enough. Teams need workflows: what gets alerted, who responds, how incidents tie to deploys, and when dashboards get reviewed. Without that, observability becomes noise.

See Core Web Vitals and performance standard.

02 — Signals

What to capture

Errors, real-user metrics, and enough context to reproduce.

  • client errors with stack traces, route, and release version
  • Core Web Vitals in the field — LCP, CLS, INP at scale
  • segment by device class, connection, and geography where useful
  • tag events with deployment ID — correlate rollbacks to metric shifts
  • performance dashboards for CWV and key custom metrics — each with an owner

03 — Workflows

From telemetry to action

Visible, actionable, reviewed — not a wall of graphs.

  • define SLOs and alert thresholds — error rate, LCP p75, failed API calls
  • on-call or triage rotation for frontend production alerts
  • weekly dashboard review — trends, not only firefighting
  • link errors to releases — rollback path when a deploy spikes JS errors
  • incident review with frontend + backend — postmortems that update alerts and budgets
  • review third-party impact when vendor tags change — see third-party scripts

04 — Avoid

Blind spots and noise

Silent failures, alert fatigue, and unused dashboards waste the effort of instrumenting.

  • checkout, auth, or search untracked while homepage metrics look fine
  • only monitoring backend — frontend errors invisible
  • alert fatigue — pages of warnings nobody acts on
  • monitoring nobody reviews — dashboards with no weekly owner
  • no runbooks — on-call guesses instead of documented triage steps
  • PII in logs or session replay without consent and redaction

05 — Close

Observe, then improve

Start with errors and CWV on highest-traffic templates — close the loop after incidents.

Add session context where debugging is hardest. If nobody reviews a chart, delete it or fix the alert threshold.

After an outage, ask: would our current alerts have caught it faster? Update thresholds, add missing spans, or fix noisy false positives.

See performance budgets, performance planning, performance roadmaps, and release readiness checklist.