blog に戻る

2021年04月19日 Theo Despoudis

Using Telegraf to collect infrastructure performance metrics

Telegraf is a server-based agent for collecting all kinds of metrics for further processing. It’s a piece of software that you can install anywhere in your infrastructure and it will read metrics from specified sources – typically application logs, events, or data outputs.

It consists of the main process and a convenient plugin ecosystem that mixes input and output services. For example, you can use compatible server plugins to collect metrics and send them to compatible outputs such as other datastores or services. These plugins use the Influx protocol line format, which defines a simple yet functional format for working with metric points. The Telegraf agent then acts as an adapter and streams metrics from various sources into registered outlets.

Read along to see how you can use Telegraf to collect and push application performance metrics locally or in the cloud.

Installing Telegraf

Before you can use Telegraf, you need to install it somewhere. Fortunately for you, there are lots of deployment options since the tool itself is written in Go. The latest Telegraf release is v1.17.0, which you can download from the official downloads page. Here’s how you can install it locally and with Docker:

Installing the Binary in a Server or Locally

You can install Telegraf using a .deb file on Linux or an .exe on Windows. On a Mac, you would use the brew installer, which is as simple as:

$ brew update 
$ brew install telegraf

You can review the default Telegraf config file to get an idea of its format as follows:

$ cat /usr/local/etc/telegraf.conf | less

To test the binary, you can use the following command:

$ telegraf --test
2021-01-25T15:42:02Z I! Starting Telegraf 1.17.0
2021-01-25T15:42:02Z E! [telegraf] Error running agent: No config file specified, and could not find one in $TELEGRAF_CONFIG_PATH, /Users/theo.despoudis/.telegraf/telegraf.conf, or /etc/telegraf/telegraf.conf

This immediately flags issues with a missing config file. You can just copy the existing config and add an example collector like this:

$ cp /usr/local/etc/telegraf.conf . 
$ export TELEGRAF_CONFIG_PATH=$(pwd)/telegraf.conf

Add the following lines to register an input collector for a PostgreSQL database that you run locally:

[[inputs.postgresql]]
  address = "postgresql://postgres:password@localhost/my_app_development"

If you have MySQL, you can use the [[inputs.mysql]] config instead.

Testing the binary again shows that the input collectors are performing well:

❯ telegraf --test
2021-01-25T15:55:28Z I! Starting Telegraf 1.17.0
2021-01-25T15:55:28Z I! Using config file: /Users/theo.despoudis/Workspace/telegraf-example/telegraf.conf
> mem,host=theo-despoudis active=14726529024i,available=15191564288i,available_percent=44.213271141052246,free=1435623424i,inactive=13755940864i,total=34359738368i,used=19168174080i,used_percent=55.786728858947754,wired=4406173696i 1611590358000000000
...
> postgresql,db=postgres,host=theo-despoudis,server=dbname\=my_app_development\ host\=localhost\ user\=postgres blk_read_time=0,blk_write_time=0,blks_hit=2642i,blks_read=207i,conflicts=0i,datid=16404i,datname="postgres",deadlocks=0i,numbackends=1i,temp_bytes=0i,temp_files=0i,tup_deleted=0i,tup_fetched=1388i,tup_inserted=0i,tup_returned=7455i,tup_updated=0i,xact_commit=21i,xact_rollback=0i 1611590358000000000
…

From these logs, you can see that the memory (mem), cpu, disk, and postgresql input tags identify the input collectors. After the tags, you can see relevant information about the metrics that were collected.

Installing Telegraf on Docker

You can also install Telegraf on Docker using the official image:

$ docker run -v $PWD/telegraf.conf:/etc/telegraf/telegraf.conf:ro telegraf

If you run into issues where Telegraf complains that it cannot connect to InfluxDB, you may need to comment out the empty configuration for [[outputs.influxdb]] and add a file output instead:

[[outputs.file]]
    files = ["stdout"]

The Telegraf config has many options and configuration parameters. For example, you can configure the frequency with which Telegraf will transmit the data, use different protocols (UDP), or prepend extra tags.

If you are interested in the complete list of input collector plugins and their configurations, you can visit this page.

Collecting Performance Metrics in a Rails Application

In this part of the tutorial, we are going to collect performance metrics from a Rails application and send them to Telegraf. You aren’t required to use InfluxDB to store metrics, since that’s part of the TICK stack (Telegraf, InfluxDB, Chronograf, and Kapacitor). This stack is excellent because it’s an open-source and developer-friendly way to run a complete monitoring stack with no upfront costs.

First, we add this gem dependency in our Gemfile:

gem 'telegraf'

Then we install the bundle:

$ bundle install

You need to open a server port for accepting incoming UDP connections before you restart Telegraf. To do that, add this configuration to the telegraf.conf file:

[[inputs.socket_listener]]
  service_address = "udp://:8094"

Once you restart the agent, you can verify that the port is open and accepting connections:

2021-01-25T19:27:58Z I! [inputs.socket_listener] Listening on udp://[::]:8094

❯ lsof -i udp:8094
COMMAND    PID           USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
telegraf 50307 theo.despoudis   10u  IPv6 0x94fb417809f5003f      0t0  UDP *:8094

Now we need to configure the Rails application to log requests with the Telegraf agent. First, you need to add the configuration values in application.rb:

require 'telegraf/railtie'
module MyApp
  class Application < Rails::Application
    config.telegraf.connect = 'udp://localhost:8094'
    config.telegraf.rack.enabled = true
    config.telegraf.rack.series = 'requests'
    config.telegraf.rack.tags = {}
    config.telegraf.active_job.enabled = true
    config.telegraf.active_job.series = "active_job"
    config.telegraf.active_job.tags = {}
  End
  # Other config values
end

This Railtie provides all the necessary hooks and initializers to connect and log requests in Rails and send them to the Telegraf agent.

Now you can start the server and see the logs in the Telegraf console:

requests,host=theo-despoudis,status=200 app_ms=1096.0680000134744,send_ms=0.7249999907799065,request_ms=1096.7960000270978 1611604693530085000
requests,host=theo-despoudis,status=200 app_ms=1386.7479999898933,send_ms=1.8629999831318855,request_ms=1388.6119999806397 1611604693826514000
requests,host=theo-despoudis,status=200 app_ms=1375.6130000110716,send_ms=14.771999965887517,request_ms=1390.3859999845736 1611604693826616000
requests,host=theo-despoudis,status=200 app_ms=1526.415000029374,send_ms=4.373999952804297,request_ms=1530.7920000050217 1611604693968698000
requests,host=theo-despoudis,status=200 app_ms=1516.6709999903105,send_ms=181.5570000326261,request_ms=1698.2309999875724 1611604694133314000
requests,host=theo-despoudis,status=200 app_ms=448.731999960728,send_ms=0.47400000039488077,request_ms=449.20899998396635 1611604694444767000

For each line, you see the tag name that we defined in the config (requests), the host name, the status code, and some requested performance timings.

Telegraf has a simple API. You just need connection credentials and you can send the metrics using the Influx line protocol. There are also community plugins for Python and Rust, and there is the Jolokia2 Agent for Java.

Using Sumo Logic Output Collector

To view your Telegraf metrics in Sumo Logic you need to use the Sumo Logic Output Plugin.

To use this collector, you need to have a Hosted Collector available and configured with an HTTP Source of metrics.

Then you need to connect Telegraf with Sumo Logic Source by adding the following configuration:

[[outputs.sumologic]]
  url = "https://events.sumologic.net/receiver/v1/http/<HTTPSourceCode>"
  data_format = "carbon2"

Once you’ve configured Telegraf to collect and transmit your metrics to Sumo Logic, you can log into your account to view your data. Just navigate to the App Catalog and select your application. You can use the Sumo Logic Query language to perform familiar queries, performance monitoring, and custom visualizations.

Telegraf in a Kubernetes Environment


The diagram below illustrates where Telegraf fits into a Kubernetes environment monitored by Sumo Logic. In this example, we’re monitoring an NGINX deployment in a Kubernetes cluster using both Prometheus and FluentD to make up the metrics collection pipeline. The cluster contains two nodes each with NGINX containers.

The first service in the pipeline is Telegraf, which collects metrics from NGINX. In this case, we’re running Telegraf in each pod we want to collect metrics from. Telegraf uses an input plugin to obtain metrics, in this case, the NGINX input plugin.

The Sumo Logic Helm chart for Kubernetes collection packages all of these components up as part of the collection process for the Sumo Logic Kubernetes Solution.

Next Steps

We’ve only shown you a simplified example with one Telegraf agent, and in a real production environment, you may have to configure Telegraf agents on each node. Each agent would then collect and stream logs into centralized collectors that can ingest a serious amount of data. The Sumo Logic Output plugin is an excellent resource for this.

Sumo Logic is a trusted cloud monitoring and observability platform that can meet the needs of all kinds of enterprises, from SMAs to Conglomerate class. If you use Telegraf as a metrics collector but need a better and more seamless experience when analyzing metrics, you can take advantage of Sumo Logic’s free trial. Sign up here and see what they have to offer.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Sumo Logic cloud-native SaaS analytics

Build, run, and secure modern applications and cloud infrastructures.

Start free trial
Theo Despoudis

Theo Despoudis

Senior Software Engineer

Theo Despoudis is a Senior Software Engineer, a consultant and an experienced mentor. He has a keen interest in Open Source software Architectures, Cloud Computing, best practices and functional programming. He occasionally blogs on several publishing platforms and enjoys creating projects from inspiration. Follow him on Twitter @nerdokto.

More posts by Theo Despoudis.

これを読んだ人も楽しんでいます