OpenTelemetry has gained significant adoption in the past year. This blog is about the common Otel demo application, but you can refer to this primer about OTel in general.
Although it has gained recognition in the industry, there are still many people who haven’t started using OpenTelemetry. If you are interested in exploring its capabilities but you’re unsure where to start, keep reading.
The OpenTelemetry community recently introduced a demo application Astronomy Shop or simply OpenTelemetry demo that provides a quick and easy path to spin up a test, multi-language, microservice-based environment that shows the level of visibility one can achieve when applying the power of open-source community-driven observability principles to a typical application stack.
The OTel demo consists of several interconnected microservices written in multiple languages that simulate an online shopping experience and allows you to send data to the analytics backend of your choice. In case you would like to check how you can monitor Astronomy Shop with Sumo Logic, we’ve made that task even easier for you by preparing a skimmed-down version of the package (no Jaeger, Grafana, etc) and pre-configured Sumo Logic OpenTelemetry distro collector included. There’s no Sumo-specific code or tricks inside, just an embedded configuration that makes things easy to set up quickly.
Part I: Setup
First, if you are not yet a Sumo Logic customer, go ahead and open a free trial account.
Optional: if you also want to try Real User Monitoring to understand your end-user experience of using the front-end web interface of Astronomy Shop simulated by OpenTelemetry demo, set up a RUM HTTP Traces source with a source name of your choice, an application name
opentelemetry-demo and service name
frontend-rum, leaving other options as defaults. Url from
collectionSourceUrl will be used as
Once logged in to Sumo Logic as admin, go directly to Security -> Installation Tokens and create a new token.
Clone the Sumo Logic Github repo and after exporting the above token and proposed collector name (e.g.
otelcol-sumo) as an environment variable, execute (from
docker compose up to pull and launch the stack.
Once logged in, go directly to Security -> Access Keys and create an access Id and the key. you are going to use a helm chart as described here and after providing the Id and the key set up a new cluster that will automatically launch your demo stack and connect to your Sumo account sending all required telemetry back to our back-end
That’s all! You’re done with the setup! Time to view the data.
Part II: Reviewing the data
Once you have the data in your system, leverage the following UI entry points gathered under the blue “New” tab button to analyze gathered telemetry:
Service Map / Services - for high-level application service health and dependencies graph
Explore - RUM & APM views for application, service and endpoint health and availability, K8S views for infrastructure monitoring (when running on Kubernetes)
Traces - to search and view individual traces and understand end to end performance bottlenecks
Spans - for multi-dimensional analytics on span level
Log search - to view and investigate logs
High-level overview of your application services health
This table gives you an at-a-glance view of all of the important high-level aspects of the health of your microservices. You can filter by application (if you have not only one but more apps monitored), environment (e.g. prod or dev), or service type and its status. By clicking on each row, you can view these KPIs in function of time and further drill down to traces or dashboards expanding the details about the holistic health of the service and application.
Click on Open in -> Entity Dashboard on the application level.
You land in the Application Health dashboard of Explore’s APM: Application View, giving insights into more details of high level KPIs for all services of our demo application. You can compare service latencies, load and errors in function of time and do the same for each service by expanding the tree on the left side of the panel.
You can click on any point in the chart and jump to traces as well if you are interested in more granular diagnostics of your end-to-end transactions in a particular context of time or analyzed service. We will cover this later, but now let’s check service dependencies on Service Map.
Something extra we have added in our distribution of OpenTelemetry demo (if you are using the upstream package, you’d have to set this up as an additional step) is browser instrumentation giving a direct insight into performance of our demo application as seen through the eyes of the user.
You are able to understand the performance of different user actions like clicking on buttons or navigation changes:
You can also track this performance through a set of various UI-specific KPIs like rendering times or core web vitals.
These views, accessible from Explore -> Real User Monitoring, help you understand the high level, user-centric view of application performance as viewed by your end customers.
Automatically built application topology
Available under Services - Map button, or as a panel on the dashboard above, you can find automatically created topology of your stack that you can filter by application, environment, zoom in and out as well as define KPIs that drive colors of the circles. Use this view not only to quickly assess the health of your whole application stack, but better understand its complexity, dependencies, breadth of technologies used within (including databases and other remotely called services) and load indicated by the size of each circle.
Clicking on each of the circles opens, what should be already a familiar view, the right side panel with information about each of the services, its supporting infrastructure, and further drill-down options.
Supporting infrastructure and its health
Here you see the details of the checkout service, with all the services it is communicating with, as well as what infrastructure components it is running. If you deploy your OpenTelemetry demo on Kubernetes, full K8S stack info is available - from cluster name, through deployment, namespace to node and pod (only one in this case, but multiple in typical production scenario).
This gives you a quick assessment of the health of the underlying infrastructure and its potential impact on application-level KPIs.
That’s not all: Sumo Logic is not only about traces or logs but is a full stack observability platform, so if you happen to be a K8S admin or operator and you are interested in more details about the health of your K8S infrastructure, you just need to select appropriate entity level and click Open in -> Entity Dashboard. Let’s do this for namespace level.
You land on the Namespace Overview dashboard where we can quickly understand the health, resource utilization and any potential problems you might have with the K8S stack supporting your demo application as its Infrastructure.
As previously for applications and services, you are in the Explore view that lets you quickly change your scope of analysis using different views in the top left and re-focus on things like the whole cluster or a specific node or pod or even container. All logs and K8S metrics are automatically gathered just by deploying Sumo Logic helm-driven collection.
Detailed transaction traces
Tracing signal is at the heart of OpenTelemetry, that’s where everything started for this project, so you are probably waiting for this part. As a matter of fact, if you deploy the OpenTelemetry demo without Sumo Logic as an analytics back-end – that (tracing data) is pretty much all you get, so yes – we have that too, in addition to all the beauty of full-stack application observability covered here in other parts of this blog.
Traces are available either as an entry point from the “+New” (tab) blue button if you want to start a new trace” search (similarly to what you would in Jager for example) or as a drilldown from almost every chart or table or map in the product. So for example, if you are interested in seeing the traces going through a specific pod, you can either search for it in the traces window or click on any chart in the Pod dashboard and select Entities -> Open in -> Traces.
I highly recommend looking into the Duration Breakdown Chart and column and noting that we do show traces here (unlike most competitors that show you span-level data in corresponding views). There is a massive difference related to that. Only in Sumo Logic, you are able to quickly understand where the time is spent in end-to-end trace execution, without a need to go to individual traces.
Just take a look at the chart! It’s quite clear that most of the time goes to
reccomendationservice, productcatalogservice and
frontend services. Calculating such aggregated insights takes a village.
Obviously it is not always the case that these three services use most of the time, especially when errors happen. Let’s take a look into Number of errors column and sort by it or filter for traces with errors>0 or drilldown to traces for example from errors chart in Application Health dashboard from the second screenshot above to find such traces:
As you can see there are two spans in the middle-bottom part of this transaction with red lines at the top, indicating an errored response.
Here per-span details come in handy. Let’s click on the client-side (request) span to learn more about what happened. You see the exact url and EOF response status here:
Normally here you’d just jump to logs to learn more about this, but let’s first take the opportunity to maybe understand more about the distribution of errors that you may be getting from the checkout service - let’s go to Entities, and Open in - Spans for this service:
Ad-hoc analysis using raw span data
Now you’ll land on another UI that proves Sumo Logic's strength in big data analytics – analyze, slice and dice your APM distributed tracing data on the lowest possible level, so you can get the most interesting insights for most complex unknown unknowns coming from your data. You don’t know what the reason for your next outage is going to be, and neither do we, so we give you this, the possibility to analyze your spans in any way you want.
Let’s just do this simple example where you want to draw a distribution of all kinds of status messages for checkout service in function of time:
As you can see in the last 15 minutes indeed only one span had a non-OK (200) status and that was the EOF we found in our trace.
Note you can perform this analysis with filters, aggregation and breakdowns for any span metadata tags, including custom ones, without any need to define schemas, add them to configuration, indexing, etc.
If you are familiar with Sumo Logic query language - you can use this to perform that type of analysis. Do you want to save the results of your analysis for later in a form of a dashboard panel? You can do that too. Sumo Logic treats all three signals of telemetry as first-class citizens, allowing you to build not only dashboards but even single panels mixing all three of them.
The needle in the haystack - getting details from logs
Traces are good to understand end to end flow of a transaction, its bottlenecks on the way and error propagation, but often, to better uncover the root cause of such error or slowdown, we need to get detailed information from logs. One of the most important characteristics of a modern observability backend is how quickly you can get from one signal to another, in context.
OpenTelemetry collector used in SumoLogic K8S helm automatically captures all the logs. For Docker (as of Dec 2022) you may add a Sumo Logic log collector for Docker to your OTel demo stack.
Many of the log messages produced by OpenTelemetry instrumentation will have automatically injected spanID and traceID, so to see these messages it is enough to use the Open logs tagged with… links that are present in any span’s summary tab. Also note you can open traces from the log lines containing traceid/spanid, just right-click on the id in your Log Search viewer interface.
Often though logs will not contain trace context. Either due to the fact that this part of the code is not instrumented or the error was so severe that context did not reach this part of the code. In such cases, the best is simply to open all logs for a certain container or pod in a specific time frame.
Most log management tools allow you to filter your logs like that if you know the container name and time (which sometimes needs to be translated between timezones) but with Sumo Logic, it is even simpler: just open the Entities tab, where all infrastructure stacks for every span is displayed, find interesting component (here: the checkoutservice container) and click Open in - Log Search:
You’ll land directly in the Log Search interface and your query and time window is automatically filled for you, so you can focus on these few log lines that pertain directly to the EOF error you saw earlier:
OpenTelemetry detailed process metrics
OpenTelemetry instrumentation also gathers detailed per process metrics helping you to understand how the executables running your microservices behave. The perfect way of getting into these is to use the same drill-down link as before: in Entities tab, select Open in -> Metrics
And that’s the walkthrough of the Common OpenTelemetry demo and Sumo Logic’s observability capabilities. If you are ready to try it with your application now - refer to our Getting Started with Transaction Tracing guide that helps you set up OpenTelemetry tracing and send it to Sumo Logic in two steps. If you prefer to start with logs and metrics, refer to this article.
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.