Golden Signals Monitoring is an SRE practice that focuses on four essential service metrics: latency, traffic, errors, and saturation. These metrics provide a high-level, user-centric view of system health. By observing them consistently, teams can quickly detect and diagnose production issues before they escalate.
How It Works
This approach centers on measuring how a service behaves from the userโs perspective. Latency tracks how long requests take to complete. Traffic measures demand on the system, such as requests per second or transactions per minute. Errors capture the rate of failed requests, and saturation indicates how โfullโ the system is, often reflected in CPU, memory, disk, or queue utilization.
Together, these signals describe both experience and capacity. For example, rising latency combined with increasing saturation may indicate resource exhaustion. A spike in errors with normal traffic could point to a deployment issue. Observing all four signals in parallel helps teams distinguish between load-related problems and software defects.
In practice, engineers instrument services to emit metrics, aggregate them in monitoring platforms, and define alert thresholds. Dashboards typically visualize trends over time, enabling rapid correlation during incident response.
Why It Matters
Production systems are complex and distributed. Monitoring every possible metric creates noise and slows response times. Focusing on four core signals reduces cognitive load and highlights what truly affects users.
This model also supports better incident management. Clear, actionable metrics shorten mean time to detection (MTTD) and mean time to resolution (MTTR). Teams gain a shared framework for diagnosing issues, improving reliability and customer trust.
Key Takeaway
Monitor latency, traffic, errors, and saturation to gain a clear, actionable view of service health and user impact.