While definitions vary, we think of observability in terms of making systems observable such that a site reliability engineer, DevOps or operations professional can obtain the visibility required to understand both a problem and its source, typically by flexibly analyzing a comprehensive set of data about application and infrastructure operations.
Observability is often discussed in terms of the requirement to collect logs, metrics and traces. We think these types of data are important to enabling operators to understand their systems, but it’s clear that simply collecting lots of varied data isn’t the end of the story. We’re seeing an evolution in the concept of observability that emphasizes asking the right questions of your systems, starting with: What aspect of performance do I care about? Through that lens, SREs, DevOps teams and operations staff can identify the appropriate data to track in order to support a level of performance that’s meaningful to the business.
Getting there has some challenges, particularly when organizations are deploying new technologies. If teams don’t examine their approach to monitoring when they adopt cloud-native technologies, they will likely discover that traditional tools and approaches don’t support the kind of visibility required in these complex and dynamic environments. Rather than accessing the insight they need, they may end up tracking performance characteristics that aren’t relevant to cloud-native technologies. Or, they simply can’t collect and access the data required to understand performance.
The process of defining and tracking service level objectives (SLOs) – or, the aspect of performance that your organization cares about – is of increasing interest to forward-thinking organizations embracing the concept of observability. Best practices are still emerging as the market experiments with what kinds of SLOs are effective and the tools required to set and track them.