It’s been almost a year since I shared some thoughts about distributed tracing adoption strategies on this blog. We have discussed how different approaches between log vendors and application performance management (APM) vendors exist in the market and how important that is to allow users to analyze the data, including custom telemetry, the way they want.
It is my strong belief that tracing data is not much different from log data, at least from the point of view of analytics use cases required to get value from both. It is well structured, it has certain mandatory fields, attributes, and tags, but with the most recent market trends with OpenTelemetry (which is great!), it is easier and easier to add customer-specific data to the spans and traces.
At the same time, most existing APM/observability vendors do not allow raw access of the data to perform true multidimensional analysis of such records. You can only visualize what they think is important, not what you believe brings value to your business.
Sumo Logic is one of the industry leaders in such analysis, so we wanted to provide it not only for logs and metrics but also for traces for end-to-end analysis of your observability data. We want to be better and provide you - our customers - what you need to ensure your applications operate at peak reliability.
I’m proud to introduce the beta of extended trace filtering and Search Query Language support. This allows customers to not only find and diagnose transactions that match any custom criteria, but also make advanced Sumo-like analysis on the top of trace span data using Sumo Search Query Language (SQL), the same way as for log data, in the same familiar interface.
This capability allows you to access raw tracing data on a span level, treat it as structured or unstructured data for analysis and filter, transform or aggregate any part of the tracing span message (a single atomic request/response representation) to deliver meaningful results to drive smarter decisions. How to use it in practice?
Let’s imagine a simple case where we want to add a microservice release number to our span data to indicate a code release version that is running. We are going to add this as a custom tag “assemblyVersion” in span data that we are sending to the Sumo Observability platform through the tracing collection layer.
Here’s how this data looks in the trace:
We can see assemblyVersion in the list of tags….
Let’s try to search for traces that have this tag set to some custom value in any span:
We get transactions executed by a microservice with this exact version. We can analyze them one by one and ensure they execute as desired, however there may be millions of them, so how to bring this analysis one level higher?
Now let’s imagine we would like to graph the 95 percentile of latency of our microservice in function of time, comparing different release versions we currently have in the staging environment. This is a much more difficult task that requires not only the ability to filter and search, but also aggregate by a custom attribute, calculate a time series out of it, and some useful graphical visualizations like charts.
Fortunately, with our new beta capabilities and the help of Sumo SQL, it is easy and nice to get such results in no time. Here’s an example query:
By selecting the aggregates tab and switching to time-series visualization, we can draw our metric on the chart and break it down by release version.
We can see not only the lifecycle of our microservice - with all upgrades, introductions of new versions in time, and how long they stayed in the staging environment - but also notice any regressions or improvements in the latency of the microservice response (measured by the length of its spans in distributed traces). It is also easy to automate the query and based on its results drive the CI/CD pipeline ensuring that no unexpected regression can be introduced between releases.
To summarize, it is now not only possible to leverage any custom data added to your OpenTelemetry open-source tracing instrumentation and use it as filters to search for traces matching the query, but also perform any custom analysis of tracing data at span level using Sumo Search Query Language, the same way you are used to doing it for logs. You can filter by any tag, regardless of its cardinality, define custom metrics and labels, aggregate them and chart their count or value in a form of time-series, bars, columns, pie-charts, and more. In a nutshell, do with tracing data, whatever you could do so far with SQL for logs.
Just think about the plethora of use cases it can bring to your daily life as an SRE or DevOps engineer, the unprecedented analysis flexibility that only Sumo Logic can offer you, analyzing logs, metrics, and traces with the same depth and granularity.
Hope you, as a Sumo Logic customer, will enjoy these new additions to our platform. You can learn more about this in the Sumo Dojo #sumo-tracing channel. And if you are new to Sumo Logic, take a look at your existing APM/tracing solution and let us know if you would like to learn how the unique and differentiated capability described above could decrease the time you spend on resolving and preventing problems in your cloud microservice environment. Just use the chat icon in the bottom right corner of your screen to reach out to us!
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.