Mesosphere DC/OS (Data Center Operating System) lets you manage your data center as if it were a single powerful computer. It separates the infrastructure logic from the application logic, and makes it easy to manage distributed systems. Considering DC/OS is meant for highly scalable, distributed systems, logging and monitoring plays a key role in day-to-day operations with DC/OS. In this post, we’ll take a look at the goals of logging with DC/OS, and how you can set up DC/OS logging with Sumo Logic.
Why Mesosphere DC/OS Clusters Need Logging
When you work with Mesosphere DC/OS, you typically have hundreds, if not thousands of nodes that are grouped in clusters. Tasks are executed on these nodes by Mesos “agent” instances, which are controlled by “master” instances. By grouping the nodes in clusters, DC/OS ensures high availability so that if any node or cluster fails, its workload is automatically routed to the other clusters. DC/OS uses two scheduling tools—Chronos for scheduled tasks like ETL jobs, and Marathon for long-running tasks like running a web server. Additionally, it includes app services like Docker, Cassandra, and Spark. DC/OS supports hybrid infrastructure, allowing you to manage bare metal servers, VMs on-premises, or cloud instances, all from a single pane of glass. Together, all of these components make for a complex system that needs close monitoring.
There are two key purposes for collecting and analyzing DC/OS logs. The first is debugging. As new tasks are executed, DC/OS makes decisions in real time on how to schedule these tasks. While this is automated, it needs supervision. Failover needs logging so you can detect abnormal behavior early on. Also, as you troubleshoot operational issues on a day-to-day basis, you need to monitor resource usage at a granular level, and that requires a robust logging tool.
Second, for certain apps in enterprises, compliance is a key reason to store historic logs over a long period of time. You may need to comply to HIPAA or PCI DSS standards.
Viewing raw logs in DC/OS
DC/OS services and tasks write stdout and stderr files in their sandboxes by default. You can access logs via the DC/OS CLI or the console. You can also SSH into a node and run the following command to view its logs:
$ journalctl -u "dcos-*" -b
While this is fine if you’re running just a couple of nodes, once you scale to tens or hundreds of nodes, you need a more robust logging tool. That’s where a log analysis tool like Sumo Logic comes in.
Sharing DC/OS logs with Sumo Logic
DC/OS shares log data via a HTTP endpoint which acts as a source. The first step to share Mesosphere DC/OS logs with Sumo Logic is to configure a HTTP source in Sumo Logic. You can do this from the Sumo Logic console by following these steps. You can edit settings like timestamp, and allow multi-line messages like stack traces.
Your data is uploaded to a unique source URL. Once uploaded, the data is sent to a Sumo Logic collector. This collector is hosted and managed by Sumo Logic, which makes setup easy, and reduces maintenance later. The collector compresses the log data, encrypts it, and sends it to the Sumo Logic cloud, in real time.
During this setup process, you can optionally create Processing rules to filter data sent to Sumo Logic. Here are some actions you can take on the logs being shared:
- Exclude messages
- Include messages
- Hash messages
- Mask messages
- Forward messages
These processing rules apply only to data sent to Sumo Logic, not the raw logs in DC/OS.
It may take a few minutes for data to start showing in the Sumo Logic dashboard, and once it does, you’re off to the races with state-of-the-art predictive analytics for your log data. You gain deep visibility into DC/OS cluster health. You can setup alerts based on the log data and get notifications when failed nodes reach a certain number, or when a high priority task is running too slow, or if there is any suspicious user behavior. Whether it’s an overview, or a deep dive to resolve issues, Sumo Logic provides advanced data analysis that builds on the default metrics of DC/OS. It also has options to archive historic log data for years so you can comply with various security standards like HIPAA or PCI DSS.
DC/OS is changing the way we view data centers. It transforms the data center from hardware- centric to software-defined. A comprehensive package, it encourages hybrid infrastructure, prevents vendor lock-in, and provides support for container orchestration. DC/OS is built for modern web scale apps. However, it comes with a new set of challenges with infrastructure and application monitoring. This is where you need a tool like Sumo Logic so that you not only view raw log data, but are also able to analyze it and derive insights before incidents happen.
About the Author
Twain began his career at Google, where, among other things, he was involved in technical support for the AdWords team. Today, as a technology journalist he helps IT magazines, and startups change the way teams build and ship applications.
Mesosphere DC/OS Logging with Sumo Logic is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.