IT用語辞典

Observability

What is Observability?

The concept and terminology of observability have only recently been applied to information technology and cloud computing. The term originated in the discipline of control systems engineering, where observability was defined as a measurement of how well a system's internal states could be inferred from its external outputs.

A system is observable if its current state can be determined in a finite time period using only the outputs of the system. For such a system, all of the behaviors and activities of the system can be evaluated based on the outputs of the system. Conversely, a system whose output sensors provide insufficient data or information to allow the operator to determine the behavior of the system would be considered unobservable.

When we apply the concept of observability to IT infrastructure and cloud computing, many of the same truths remain, but the definition must be slightly changed.

IT infrastructure consists of hardware and software components that automatically generate records of every activity on the system. These records include application logs, system logs, security logs, and several other types that document everything from system sign-ins to security threats and events. The key to achieving true observability of IT infrastructure and cloud computing environments is not the event logs themselves - rather, it is the capability of monitoring and analyzing those events, along with KPIs and other data, that drive observability and yields actionable insights. IT organizations can implement observability platform software tools that streamline the aggregation and analysis of event logs.

Three Data Formats of Observability: Event Logs, Metrics, and Traces

A cloud computing environment generates data in three formats that can be aggregated and analyzed to enhance network observability: event logs, metrics, and traces.

An event log is a record of an event that happened on a system. Event logs are automatically computer-generated and timestamped, then written into a file that cannot be modified. They provide a complete and accurate record of discrete events, including additional metadata about the system state when the event occurred. Log files may be written in plaintext or structured in a specified format.

A metric is a numerical representation of data that was measured over some period of time. Unlike an event log, which records a specific event, a metric is a measured value that is derived from system performance. Metrics frequently carry information about application service level indicators (SLIs), like how much memory or processing power is being used or the latency.

A trace is the documented record of a series of causally related events that happen on a network. The events do not have to take place within a single application, but they do have to be a part of the same request flow. A trace can be formatted or presented as a list of event logs taken from different systems that were involved in fulfilling the request.

Events and KPIs: Machine Data Inputs that Promote Observability

IT infrastructure produces logs, metrics, and traces that tell a story about activity on the network. These three data formats deliver two types of information that observability platforms need to derive insights into network security and performance: events and KPIs. The ability to capture and isolate network events and compute KPIs from logs, metrics, and traces is the key to achieving business goals with enhanced observability.

Log files are the main source of data about events. The entire purpose of log files is to help developers debug their software by providing visibility into the events that the software is producing.

Log files, metrics, and traces all contribute to KPI computation:

  • Log files can be used to compute KPIs. For example, a failed login is an event, but a high number of failed logins from an external IP address is a Key Risk Indicator (KRI) that could indicate a brute force attempt to gain access to your application.
  • Metrics can include measurements of how much memory or processing power an application is using. These metrics can act as KPIs, indicating when application performance is poor or when a DDoS attack could be underway.
  • Traces provide insight into request flows and transaction times in the system. They can be used to inform KPI measurements like request processing time or time per transaction.

A software observability platform aggregates data in the three main formats (logs, metrics, and traces), processes it into events and KPI measurements, and uses that data to drive actionable insights into system security and performance.

What are the Objectives of Observability?

The observability of a cloud computing environment is not a goal on its own - it should be seen as a necessary step towards achieving key business objectives. The goal of developing observability is to enable security analysts, IT operators and managers to better understand and address problems in the system that could negatively impact the business. There are three key objectives associated with developing observability of cloud computing networks:

Reliability

Reliably is one of the first goals of observability. If we want to build an IT infrastructure that functions in a reliable way and according to the needs of the customer, we need to measure its performance. With an observability platform software tool, we can monitor user behavior, network speed, system availability, capacity, and other metrics to ensure the system is performing as it should.

Security & Compliance

The observability of cloud computing environments is of the utmost importance to organizations with regulatory or compliance requirements to secure sensitive data against improper exposure. With full visibility into the cloud computing environment through event logs, organizations can detect potential intrusions, security threats, and attempted brute force or DDoS attacks before the attacker can complete the attack and steal data.

Revenue Growth

Businesses can drive revenue growth with network observability. The ability to analyze events on the network can yield valuable information about user behaviors and how they may be affected by underlying variables like application format, availability, speed, and others. This data can be analyzed to develop actionable insights on how to optimize the network and applications to generate more revenue from customers (and attract new ones).

Optimize Your Cloud Observability with Sumo Logic

Observability of cloud computing platforms depends on your ability to capture logs, metrics, and traces, process them into a useful format, and parse the data to discover useful insights.

Sumo Logic's cloud-native platform is an all-in-one solution for observability of cloud computing environments. With Sumo Logic, your IT organization can aggregate log files, metrics and traces, evaluate network performance against the most critical KPIs and gain the insights and network visibility needed to meet your business objectives for system reliability, security and customer satisfaction.