blog に戻る

2019年08月27日 Sadequl Hussain

How to Read, Search, and Analyze AWS CloudTrail Logs

In a recent post, we talked about AWS CloudTrail and saw how CloudTrail can capture histories of every API call made to any resource or service in an AWS account. These event logs can be invaluable for auditing, compliance, and governance. We also saw where CloudTrail logs are saved and how they are structured.

Enabling a CloudTrail in your AWS account is only half the task. Its real value is gained by analyzing the logs and making sense of any unusual pattern of events or finding root cause of an event.

In this post, we will talk about a few ways you can read, search and analyze data from AWS CloudTrail logs.

Understanding Cloudtrail Log Structure

CloudTrail logs are nothing but JSON-formatted, compressed files. If you download a CloudTrail log file and open it in a text editor, you will see something like this:

{
 "Records": [{
 "eventVersion": "1.05",
 "userIdentity": {
 "type": "IAMUser",
 "principalId": "AIDAIPFJYALOFOP2DK2SY",
 "arn": "arn:aws:iam::xxxxxxx:user/Administrator",
 "accountId": "xxxxxxx",
 "accessKeyId": "xxxxxxx",
 "userName": "Administrator",
 "sessionContext": {
 "attributes": {
 "mfaAuthenticated": "false",
 "creationDate": "2018-10-21T09:17:01Z"
 }
 },
 "invokedBy": "signin.amazonaws.com"
 },
 "eventTime": "2018-10-21T12:59:14Z",
 "eventSource": "s3.amazonaws.com",
 "eventName": "GetBucketLocation",
 "awsRegion": "us-east-1",
 "sourceIPAddress": "12.34.56.78",
 "userAgent": "signin.amazonaws.com",
 "requestParameters": {
 "bucketName": "mytest-cloudtrails",
 "location": [""]
 },
 "responseElements": null,
 "additionalEventData": {
 "vpcEndpointId": "vpce-xxxxxxxxxx"
 },
 "requestID": "544A415A398A169C",
 "eventID": "5989cc55-f752-468f-b669-f8abbeb008ba",
 "eventType": "AwsApiCall",
 "recipientAccountId": "xxxxxxxxxx",
 "vpcEndpointId": "vpce-xxxxxxxxxx"
 }
 ]
}

If you look closely, this particular JSON document is giving us enough information about an S3 event: GetBucketLocation. We can see an IAM user called Administrator had logged in through the console, without any multi-factor authentication, from an IP address: 12.34.56.78, and invoked this event.

There can be dozens or even hundreds of events like this in one CloudTrail file. The JSON snippet we are seeing here had been part of a larger document, the other entries were deliberately removed.

As you can see, an event consists of multiple fields, and each field describes a specific attribute of the event. When you are looking at a CloudTrail file, here are the important fields you need to be aware of:

Field Name

Why it’s Important

eventTime

The date and time of the event - reported in UTC.

eventType

There can be three types of event:


  • AwsConsoleSignin: when someone logs in through the AWS console to your AWS account
  • AwsServiceEvent: When the called service generates an event
  • AwsApiCall: When the public API for an AWS resource is accessed

eventSource

The AWS service that emitted the event. For example, it could be s3.amazonaws.com, which means the event was generated by the S3 service. Similarly, ec2.amazonawsa.com represents an EC2 service generated event.

sourceIPAddress

The IP address where the request came from. If one AWS service calls another service, the DNS name of the calling service is used.

userAgent

The tool or application used to make the call. This can be the console, an application written in a specific SDK, the AWS CLI or perhaps a Lambda function, and so on.

errorMessage

Any error message returned by the called service.

requestParameters

The parameters that were passed with the API call. The list of parameters can vary depending on the type of resource or service called. For example, in the JSON snippet we just saw, there are two requestParameters: bucketName and location.

resources

List of AWS resources accessed in the event. This can be the resources’ ARN, an AWS account number or the resource type

userIdentity

A collection of fields that describe the user or service which made the call. These fields can vary based on the type of user or service.

Searching Cloudtrail Logs from Event History

Reading through hundreds of thousands of lines from hundreds of CloudTrail logs is not a practical solution. It would be ideal if there was a search facility that allowed users to specify one or more search criteria while the tool searched through all log files to find the matching events.

Fortunately, there is one such facility in the CloudTrail event history console. The event history console allows searching through events that occurred in the past 90 days. The search console has a default search condition to filter out read-only events like ListKeys or DescribeInstance. The image below shows how we are searching for events related to “DBInstance” type resource:

You can expand any event from the resultset to reveal further information about the event.

Analyzing Cloudtrail Logs using Amazon Athena

Being able to search CloudTrail logs is great, however, a better approach would be to analyze those logs. Analyzing CloudTrail logs can unearth useful information which may otherwise go unnoticed.

The event history console also allows AWS administrators to create an Amazon Athena table mapped to a CloudTrail logs bucket.

Amazon Athena is a serverless query tool that can run interactive SQL queries on S3 data. This data can be in different formats like CSV, JSON, Avro or Parquet. Athena also allows creating table structures from custom data sources, including CloudTrail logs.

Once an Athena table is created on top of CloudTrail logs, it can be queried using standard SQL.

To create the Athena table, click on the link “Run advanced queries in Amazon Athena” in the event history console. This opens up the “Create a table in Amazon Athena” dialog box:

Choosing the CloudTrail bucket name in the “Storage location” field changes the Athena table name. The naming pattern for Athena CloudTrail table is: cloudtrail_logs_<bucket name>.

Once you click “Create table” in the dialog box, the Athena table is created and CloudTrail shows the following message:

Clicking on the “Go to Athena” button then navigates to the Amazon Athena console. From the Athena navigation pane, if you choose the default database, you will see the newly created table for CloudTrail. Note how the table fields correspond to CloudTrail fields:

If you are familiar with the SQL language, you can easily run some basic searches now.

In the query below, we are searching for the last 200 events that resulted in an error:

SELECT 
 eventtime
 ,eventname
 ,eventsource
 ,awsregion
 ,errormessage
FROM 
 default.cloudtrail_logs_athena_cloudtrails 
WHERE
 errorcode IS NOT NULL
ORDER BY 
 eventtime DESC
LIMIT 200;

Here is a portion of the resultset returned:

Similarly, another query lists the top 20 types of events related to Amazon Key Management Service (KMS):

SELECT 
 eventname
 ,count(*) as num_events
FROM 
 default.cloudtrail_logs_athena_cloudtrails 
WHERE
 eventsource LIKE '%kms%'
GROUP BY
 eventname
ORDER BY 
 count(*) DESC

The image below shows the Decrypt event has the highest number of occurrences, followed by ListAliases:

Analyzing Cloudtrail Logs using CloudWatch Logs Insights

In our previous post, we showed how to send CloudTrail logs to Amazon CloudWatch. Another way to analyze CloudTrail logs is to use CloudWatch Logs Insights. With CloudWatch Logs Insights, you can run custom query language commands to analyze CloudWatch logs, including those coming from CloudTrail.

CloudWatch Logs Insights comes with a number of pre-built queries for different types of AWS service logs, including three templates for CloudTrail, as shown below:

The query language is similar to those used in traditional log management or SIEM tools. The query below is showing how to find the number of EC2 related events, broken down by event name and AWS region:

Running the query shows the number of events and their distributions over time:

The query below shows the distribution of S3-related events in the US-East-1 region, grouped by event time and event name:

Final Words

So now you have seen a few ways to make sense of AWS CloudTrail logs. There are other sophisticated tools in the market that can go even further than what we have seen here. These tools can create better visualizations from CloudTrail logs and are able to show anomalies and trend graphs in those visualizations. They can also send automated alerts based on error conditions that you define. One such tool is Sumo Logic. Sumo Logic’s CloudTrail app can be a valuable addition to any AWS administrator’s toolset. We will talk about it in our next post, so stay tuned!

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Sadequl Hussain

Sadequl Hussain

Sadequl Hussain is an information technologist, trainer, and guest blogger for Machine Data Almanac. He comes from a strong database background and has 20 years experience in development, infrastructure engineering, database management, training, and technical authoring. He loves working with cloud technologies and anything related to databases and big data. When he is not blogging or making training videos, he can be found spending time with his young family.

More posts by Sadequl Hussain.

これを読んだ人も楽しんでいます