System administrators hold many key responsibilities within an IT organization. Most importantly, they must ensure that all systems, services, and applications are up, running, and performing as expected. When a system starts to lag or an application is down, the system administrators are called upon to troubleshoot and resolve the issue as quickly as possible to limit the impact on customers.
Reacting to and resolving these issues in a time-efficient manner requires useful metrics that can be leveraged to diagnose problems. One way in which these metrics can be collected is through the use of collectd, a Unix daemon for collecting systems and application performance statistics. In this article, I’ll explain how to get started with collectd by showing you how to install and configure the daemon, along with detailed instructions for properly collecting system metrics that can be used to resolve performance-related issues.
What is collectd?
As mentioned above, collectd is a Unix daemon for collecting system and application performance statistics. As a daemon, collectd runs in the background and gathers key system metrics that can be used to produce valuable visualizations for gaining insight into issues within a particular system. If you are in the process of evaluating options for tools that record system metrics, there are several key advantages to working with collectd that you should take into consideration:
- collectd is open source - Since collectd is an open source project with a solid presence in the system metrics collection marketplace, it is free to use and always under active development. As you know, open source software means that there are more eyes on the source code, which often leads to the development of a quality product. According to the GitHub repository issues page, plenty of attention is being devoted to bugs and enhancements to ensure a high level of quality in each release.
- collectd has a high-level of portability - collectd is driven by plugins. The master daemon does not rely upon any external dependencies. This allows it to be run on many different operating systems. According to the features page on the official website, collectd “should run on nearly anything that has heard of POSIX.” In addition, there is also support for Windows via SSC Serv.
- collectd is extensible - Since the needs of teams vary with each organization, it’s possible that collectd won’t fulfill a particular need by default. Fortunately, collectd is highly extensible. There are plugins for various languages (such as C, Perl, Java, and Python) that allow functionality to be enhanced to meet users’ needs - and all in a language with which the team is comfortable developing.
- collectd is scalable - Whether your goal is to monitor one host or many, collectd is up to the task. If you are interested in efficient resource management, the way in which collectd updates RRD files with the statistics that it gathers makes it a viable option for both small and large networks.
How to Set Up collectd
Now that we’ve familiarized ourselves with collectd and its advantages, let’s take a look at how to get started with it by implementing metrics gathered on a machine running Ubuntu 18.04. In this example, I will install an Apache web server on an Ubuntu machine, and then install and configure collectd to gather metrics from our Apache web server.
Load a Python environment
As a first step, you will need to check to see if you have Python installed on your machine and install it if you do not. This is a relatively straightforward process and can be accomplished with a few simple commands:
Prior to installing any software on the machine, first run the following command:
sudo apt-get update
This will update our package lists to ensure that we will be downloading and installing the latest and greatest versions of the software that we require. Once this command has been executed successfully, we can continue with our installations.
For the purposes of this exercise, let’s install Python 3.7 on the Ubuntu 18.04 machine. To do so, run the following command:
sudo apt-get install python3.7
First, let’s install our Apache web server prior to the collectd installation. The following command will install our Apache web server with the default set up for an Ubuntu machine:
sudo apt-get install apache2
Once the install command for Apache executes successfully, we should start our web server to ensure that it has been properly installed:
sudo service apache2 start
After starting the web server, you should be able to access the localhost. Open a web browser and type in the following URL:
Now that we have our Apache web server running on our machine, it’s time to install collectd! This can be done using the apt package manager. Simply execute the following command and the apt package utility will install collectd on our host:
sudo apt-get install collectd
Now collectd is installed on the Ubuntu machine and you’re ready to collect a variety of systems and applications metrics.
The next step is to configure collectd for our purposes. This configuration is defined in the collectd.conf file located in /etc/collectd on the Linux machine. The command for using vim to modify this configuration file is as follows:
sudo vim /etc/collectd/collectd.conf
Collectd makes configuration simple by providing as much information as possible to help you get started. You will find that many lines within the configuration file are commented out, and simply commenting/uncommenting will help you set up a basic configuration that will work for you. As we’ll see later, collectd also provides commented configurations for plugins that are disabled by default to help format your configuration file properly when enabling them.
Right now, we’re just going to set the name of the host machine that we’re running collectd on and we’re going to disable the FQDNLookup option to prevent the daemon from trying to discern the fully qualified domain name. I am choosing “localhost” for my host name, so my configuration file looks like this:
While we will demonstrate the LoadPlugin option later in this tutorial, there are also a variety of other configuration options that are beyond the scope of this article. Please visit collectd’s configuration documentation for more insight.
Configure Apache to Report Stats
In order to gather metrics for the web server, the Apache plugin for collectd queries the status page generated via the Apache status module - mod_status. Thus, we must first ensure that the mod_status module is enabled for apache2 on the host machine. To see if it is enabled by default, visit the following URL:
If this link brings you to the Apache web server statistics page generated by your Apache instance, you are all set! If not, you must enable the mod_status module. There are a few ways to do this, and one is to run the following command in your terminal:
sudo a2enmod status
Another way to enable mod_status is to open the status.conf configuration file and either uncomment or add a few lines of code. On a machine running Ubuntu 18.04, the status.conf file will be located in /etc/apache2/mods-enabled.
Using vim, we can open the configuration file with the following command:
sudo vim /etc/apache2/mods-enabled/status.conf
Either uncommenting or adding the following lines should enable the mod_status plugin and allow the apache2 instance to generate the web statistics page at the /server-status endpoint.
Stop and restart the Apache web server with the following commands and revisit the /server-status link to view the statistics page:
sudo service apache2 stop
sudo service apache2 start
With the status module enabled within the Apache instance and the /server-status URL available, it’s time to configure the Apache plugin within the the collectd configuration file.
Configuration documentation: https://collectd.org/wiki/index.php/Plugin:Apache
Once again, open the collectd.conf file located in /etc/collectd. Locate the LoadPlugin section within the configuration file and add or uncomment the following line:
Insert the following block of code:
In the code snippet above, we are configuring one instance for the Apache plugin. This instance will be referred to as “web-tracking,” and http://localhost/server-status?auto will be utilized by collectd to gather the web server metrics. Be sure to append “?auto” to the end of the URL, as a failure to do so will result in a MIME type of “text/html” being returned. This is incompatible with the plugin, and “?auto” will force the MIME type to be “text/plain.” Please consult the official collectd documentation for this plugin for further information on the process of configuration.
Setting Storage Schema and Aggregation
Collecting metrics for with collectd can lead to data overload for an organization. In other words, while it’s great that collectd has the ability to be so granular with their data collection, it is also nice to be able to aggregate these statistics where it can make things simpler for the SysAdmin analyzing the data. Fortunately, collectd has a plugin for that. The Aggregation plugin has a variety of applications and configuration options designed to allow the user to take the raw data gathered via collectd and consolidates this data to make it more understandable to the human eye. For example, taking the CPU utilization statistics for each core on a particular host machine and performing calculations to obtain the average across these cores for that particular host.
Once you have identified the metrics being gathered that would be more useful to you when aggegated in a particular manner, open the collectd.conf file (located at /etc/collectd/collectd.conf) and add the following line in the LoadPlugin section:
From here, you will need to configure an instance of the plugin to aggregate the desired metric in a particular manner. This will include adding a block of code within the configuration file that will resemble the following:
Within the <Aggregation> tag you will implement your particular configuration for metrics aggregation. Keep in mind that these aggregated values generate new names based on this configuration. And you’ll want to understand the naming schema. If you wish to dive into these options a little further I encourage you to check out the naming schema information as well as a few useful sample configurations which are readily available in the official collectd wiki.
Reload the Services
Earlier, we mentioned stopping and starting the Apache web server and carbon services, but what about the collectd service? In order to refresh the configuration values within collectd, you should stop (if running) and restart the collectd service. This can be done using straightforward stop and start commands in your Linux terminal:
sudo service collectd stop
sudo service collectd start
After executing these commands, I recommend checking the status of all services to ensure that both apache2 and collectd are running on the host machine:
sudo service --status-all
Tracking System Metrics with collectd
Tracking other system metrics with collectd is often as simple as enabling other plugins within the collectd configuration. There are a variety of plugins to consider, some of which are configured with the default installation of collectd.
Steps to Tracking System Metrics
- Choose the metrics you wish to track - As mentioned above, collectd tracks specific system metrics via specific plugins and there are many plugins to consider. The first step in configuring collectd to track system metrics is to decide what you wish to track through your configuration of collectd. Below are a few examples of plugins you may wish to enable:
- cpu - This plugin collects cpu usage statistics.
- battery - Built for laptops, this plugin tracks metrics related to battery life, power, and voltage.
- disk - This plugin gathers metrics related to the usage of physical and logical disks.
- memory - This plugin collects statistics related to memory usage on a particular machine.
- Navigate to the directory where your configuration file is located and open the file - Much like the configuration of the Apache plugin, setting up tracking for any system metric is done via the configuration file. On an Ubuntu 18.04 machine, this would be the collectd.conf file located in /etc/collectd/.
- Enable and configure the plugins you wish to use - Enabling your plugin is done by either adding or uncommenting the LoadPlugin line for the plugins you wish to use. After doing so, you can set plugin-specific options to configure the selected plugin to work as desired. Consult their documentation to research configuration options for the specific plugins you wish to enable.
- Employ software for visualizing the data you collect - While collectd gathers statistics, it does not provide graphing functionality to produce visualizations. For the purpose of displaying the metrics gathered by collectd in this example, let’s take a look at the data using kcollectd.
The following command will install kcollectd using the apt package utility:
sudo apt-get install kcollectd
Once installed, and assuming that collectd is configured properly and the service is running, we can view our data using kcollectd. First, launch kcollectd by running the following command:
The interface for kcollectd should now be open. You will see your configured instances in the left pane. In our case, let’s take a look at the graphs for the Apache plugin. Since we applied the name “web-tracking” to our instance, we will be looking for that name in the tree. Selecting that instance and the various categories nested within it will allow us to visualize the Apache metrics gathered by collectd.
How to Monitor collectd Logging Events with Sumo Logic
As we discussed earlier, there are simple programs (such as kcollectd) that enable data visualizations for gleaning insights from metrics gathered by collectd; but sometimes you need a more complete solution. Sumo Logic is a log management and analytics platform that (with a little help from an open source plugin) can read the statistics gathered by collectd and produce visualizations that can assist the system administrators with the network management.
Once you’re set up with a Sumo Logic account (free trials are available), the process for integrating with collectd is relatively straightforward:
- As a first step, you will need Python (version 2.6 or higher) installed on your machine. Visit the section earlier in this article on loading a Python environment to get started working with Python on a Linux machine.
- The easiest way to download and install the Sumo Logic collectd plugin is to install it as a library. This can be done with the following command:
sudo pip install sumologic_collectd_metrics
- Within your Sumo Logic account, you will need to configure an HTTP source for use with collectd. This will be vital when configuring the Sumo Logic plugin within the collectd configuration file.
- Speaking of which, you must configure the Sumo Logic plugin within the collectd configuration file (located at /etc/collectd on an Ubuntu 18.04 machine). This plugin has both required and optional parameters that can be configured based upon the use case for the plugin. The URL endpoint for the HTTP source, configured in the last step within Sumo Logic, is required for telling collectd where to send the data. The README for the open source plugin provides greater detail for the proper set up of this plugin within the collectd configuration file.
- Finally, start your collectd service. With the Sumo Logic collectd plugin properly configured, collectd will now send the metrics that it gathers to Sumo Logic for visualization and analysis.
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.