When time is money: How SLOs optimize platform performance for accurate time tracking

Results at a glance

Improved insight into end-user experience

Reduced false alerts and escalations

Consolidated monitoring tools

A clear answer to “Are our customers happy?”

Challenge

Too many monitoring vendors left Laurel with numerous blindspots and no way to know if they were delivering on customer promises.

Laurel is committed to helping their customers automate time-keeping and billing processes. Most lawyers use Microsoft products exclusively, so they primarily interact with Laurel’s system through the Windows desktop application.

Nine outside vendors monitored Laurel’s systems, making the reliability of their desktop application and per-customer infrastructure architecture challenging.

Even with this battery of solutions, there were a lot of reliability blindspots — they didn’t know if they were delivering on customer expectations to manage time efficiently. With only alerts for CPU and memory on their containers, many poorly enriched alerts were ignored. Their on-call rotation was disorganized and included people who weren't in the company anymore.

Laurel's product and engineering teams took a hard look at their current workflows — and found them lacking. They had to fix their general approach and understanding of infrastructure monitoring and, in the process, improve the well-being of their engineers through better clarity and less stress.

Laurel ran a request for proposal (RFP) with nine vendors, searching for a monitoring solution to provide broader visibility. Laurel's product and engineering teams also decided to take a service level objective (SLO) approach to reliability and set out to find solutions that aligned with this.

"We have alerts, we have metrics, we have traces, we have logs, but we don't really know if our customers are experiencing our product well, which is frustrating and not really the conversation I want to have with our senior-level leadership," said Nat Welch, Lead Cloud Platform Engineer at Laurel.

Solution

Ultimately, Laurel had two goals:

  1. Ensure customers are happy with their offering and can rely on their product.

  2. Care for the mental health of Laurel's employees by focusing their time on high-value tasks

Laurel chose Sumo Logic as its monitoring platform. Sumo Logic worked seamlessly with the other tools they incorporated into their new workflows, such as OpsGenie for their on-call rotation and OpenTelemetry for simplifying and broadening collection.

Laurel nat welch

We want to provide a good product for our customers but also improve the mental health of our engineers as they build our product.”

Nat Welch

Lead Cloud Platform Engineer, Laurel

Results

Are our customers happy? This is the question that Laurel's engineers needed to be able to answer and communicate to senior management confidently.

Operationalizing SRE practices and app reliability through SLOs

Laurel’s engineering team needed deeper insight into the customer experience beyond base-level monitoring from out-of-the-box dashboards. The team defined and tracked SLOs using Sumo Logic's Reliability Management solution.

Reliability, as formalized in SLOs, helps engineers focus on monitoring and troubleshooting user experience by measuring what matters to end users while reducing potentially meaningless alerts and false escalations.

The Laurel team started with two SLOs — request latency and API success — to monitor the main API for their service endpoint, covering customer environments and shared services.

Operationalizing SRE practices and app reliability through SLOs - dashboard 1
Operationalizing SRE practices and app reliability through SLOs - dashboard 2

With automated SLO monitoring, when Laurel spins up new services in their application, they can easily create generic SLOs to start tracking user experience with the data already ingested within Sumo Logic. The same applies whenever they onboard new customers.

Operationalizing SRE practices and app reliability through SLOs - dashboard 3
The SLO dashboard within the Sumo Logic UI provides an active view of the health and status of the services Laurel is monitoring.

The SLO dashboard shows:

  • Service-level indicators (SLIs): Quantitative measures (typically shown as percentages) of the system or service availability within the specified compliance period.

  • Error budget remaining: Based on the SLO, this is the remaining amount of errors that can occur while staying compliant.

  • Error history: A way to track errors and see how fast they are resolved, reducing unnecessary alerts.

Laurel nat welch

In our previous alerting, any error probably would've fired an alert and woken someone up or interrupted someone during their workday. And since we only have 50 engineers, that's not great. We're moving away from that.”

Nat Welch

Lead Cloud Platform Engineer, Laurel

Seamless use and access of data with tool consolidation

Because Laurel uses Sumo Logic as their observability solution, there's no need to migrate data to other monitoring tools. Data is shared and accessible between features, eliminating steps in the SLO monitoring process. Laurel can also use the raw source data exported from Sumo Logic in custom dashboards.

Laurel nat welch

Sumo Logic's SLO solution used existing data. One of the big reasons we didn't want to spin up a new customer is we'd have to come up with some sort of pipe to send the data over.”

Nat Welch

Lead Cloud Platform Engineer, Laurel

Innovations for improved productivity

Sumo Logic's product team added Terraform support to streamline Laurel’s SLO setup and monitoring. With this integration, the Laurel team only needed to write 90 lines of code to create the few hundred SLOs needed for their clientele.

"Some of the problems we had with some of our old monitoring companies is that they hadn't changed their product in multiple years. Sumo Logic was constantly innovating, and they also had a reasonable pricing structure, which, as a startup, is important to us," said Welch.