1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. How to Implement & Enable Logging Across AWS Services (Part 2 of 2)

Filter and Query data with Amazon Athena

The course is part of these learning paths

DevOps Engineer – Professional Certification Preparation for AWS
course-steps 35 certification 5 lab-steps 18 quiz-steps 2 description 3
SysOps Administrator – Associate Certification Preparation for AWS
course-steps 35 certification 5 lab-steps 30 quiz-steps 4 description 5
Security - Specialty Certification Preparation for AWS
course-steps 22 certification 2 lab-steps 12 quiz-steps 5
AWS Services Monitoring & Auditing
course-steps 6 certification 1 lab-steps 3 quiz-steps 2
more_horiz See 1 more

Contents

keyboard_tab
Introduction
1
Introduction
PREVIEW3m 35s
AWS Logging Mechanisms
Summary
play-arrow
Start course
Overview
DifficultyAdvanced
Duration1h 4m
Students584
Ratings
5/5
star star star star star

Description

Course Description

This course is part 2 of a 2 part course series which focuses on a number of key AWS services and how they perform logging and monitoring across your environment.  Being able to monitor data provides a number of key benefits to your organization, such as compliance, incident detection and resolution, trend analysis and much more. Collating data and statistics about your solutions running within AWS also provides the ability to optimize it's performance.  This series looks at how to implement, configure, and deploy logging and monitoring mechanisms using the following AWS services and features.

Part 2:

  • Amazon CloudFront Access Logs
  • VPC Flow Logs
  • AWS Config Configuration History 
  • Filtering and searching data using Amazon Athena

Part 1: 

  • Amazon CloudWatch - CloudWatch Monitoring Agent
  • AWS CloudTrail Logs
  • Monitoring CloudTrail Logs with CloudWatch Metric Filters
  • Amazon S3 Access Logs

The course for Part 1 can be found here

Learning Objectives

By the end of this course series you will be able to:

  • Understand why and when you should enable logging of key services
  • Configure logging to enhance incident resolution and security analysis
  • Understand how to extract specific data from logging data sets

Intended Audience

The content of this course is centered around security and compliance. As a result, this course is beneficial to those who are in the roles or their equivalent of:

  • Cloud Security Engineers
  • Cloud Security Architects
  • Cloud Administrators
  • Cloud Support & Operations
  • Compliance Managers

Prerequisites

This is an advanced level course series and so you should be familiar with the following services and understand their individual use case and feature sets.

  • Amazon CloudWatch
  • AWS CloudTrail
  • Amazon EC2
  • CloudFront
  • Lambda
  • AWS Config
  • Amazon S3
  • IAM
  • EC2 Systems Manager (SSM)

This course includes

6 lectures

4 demonstrations

Feedback

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.

Transcript

Transcript

Hello and welcome to this lecture. We will learn how to analyze and search for specific data across log files being stored in S3. If your log data is being stored in S3, such as your CloudTrail logs, then you can use Amazon Athena to query that data within S3 to search for specific entries. The following demonstration has been created by Jeremy Cook, one of our AWS expert trainers here at Cloud Academy. In this demonstration, Jeremy will walk through the steps required to set up Athena to allow you to query CloudTrail log data.

Start of demonstration

In this demo we'll walk through the steps required to set up Athena to allow us to query CloudTrail log data. This demo will involve configuring CloudTrail, S3, EC2, and the Athena services. The end result will allow us to perform SQL queries against CloudTrail data stored in an S3 bucket. This type of setup will aid your DevOps and SecOps experience when building on top of the AWS platform. 

Let's get started. Within the AWS console, select the Athena service. The first step involves us creating a new Athena database, which will host our custom data table created later in this demo. This task can be accomplished either through an Athena provided wizard or manually through the query editor using an appropriate SQL create statement. We'll perform this by using SQL statements. But for learning purposes, I will show you where the wizard resides within the Athena console. Let's see where this is located. Clicking the catalog manager, menu item in the Athena menu bar, takes us into the catalog manager. Initially, we have only the default Athena database to work with. We can click the Add Table action. This opens the Add Table Wizard. The first step requires us to either choose an existing database or create a new database. In this case, I'm going with the create new database option. I need to provide the name for the new database. In this case, I'm going with New DB. And then I need to provide the name for the new table. I'll go with New Table. If I were to complete this wizard, I would additionally need to specify the S3 bucket location of our input set. However, I'm now going to cancel out of this wizard and perform the same process manually using SQL statements within the query editor. 

Clicking back on the query editor menu item, we're taken back into the query editor. Clicking within the editor itself clears the area. I'm now going to switch over to my visual code editor that has some pre baked SQL queries. I'm taking a copy here of a create database statement. Flipping back to the query editor, I paste the create database statement into the editor pane. As you can see, this particular create database statement will create as a new database named CloudTrail DB1. The create database statement, in this example, specifies an S3 bucket location. This bucket will be used to store the new database's catalog. Let's go and create this S3 bucket now, opening the S3 console in a separate browser tab. Clicking the Create Bucket button. In the resulting create bucket window, we paste in our bucket name that our create database statement references. We then hit the Create button at the bottom of the window. This creates our S3 bucket that will store the database catalog. 

Okay, great. Our bucket has been created successfully as shown here. Next, we'll go back to the Athena query editor. The create database statement can be configured to execute with metadata that may be relevant to your cause. In this example, I'm simply going to set the creator as myself, the company to be Cloud Academy, and created to the current year. With all this in place, we can now go ahead and execute the create database statement. We do so by clicking the Run Query button. Let's do this now. We now wait for the query to complete. As we can now see, the create database statement has executed successfully, as per the query successful response in the result section. Additionally, we can see that we have our new database now displaying in the left hand side menu. We'll now create our first table within our new database. 

Let's flip over to Visual Code and take a copy of a pre built table creation statement. Back within the query editor, we paste in the table creation statement. Here you can see that our new table will be named CloudTrail logs. The create table statement specifies all the relevant column attributes that CloudTrail tracks per order record. Next, we highlight the serializer-deserializer, or SERDE in short form, that is used. In this case, we're using the Amazon Provided CloudTrail SERDE. Finally, I draw your attention to the S3 bucket location that needs to be provided. This particular S3 URL represents the location where our raw CloudTrail logs will reside once configured. Let's go ahead and create this bucket. Switching back over into the S3 console, click the Create Bucket button. In the resulting Create Bucket window, we paste in our bucket name that our create table statement will reference. In our case, we'll create our new bucket with the name CA CloudTrail Logs Demo. We then hit the Create button at the bottom of the window. This creates our S3 bucket that CloudTrail will be next to configure to push logs into, and for which Athena will scan from when executing our SQL queries. 

Okay, great. Our bucket has been successfully created, as can be seen here. Switching back into the query editor window, we now paste in the S3 bucket name we just created. Okay, everything looks ready. Let's now click on the Run Query button. And again, our query has executed successfully. And in this case, our new CloudTrail logs table has been created. On the left hand side, we see our newly created table listed. Clicking on the preview icon to the right of the table name executes the sample query now shown in the editor pane. This query will perform a Select All across the table, but limited to the first 10 rows. Since our CloudTrail bucket has yet to be populated, it's expected that the query will return an empty result set as it does. Next, if we expand the table name itself, we see the column names and types that define it. Finally, clicking on the table properties icon, we are presented with a view of all the respected table properties associated with our new table. Important properties include, table name, database name, S3 bucket location, and serialization library. Let's now go and establish a new CloudTrail trail and configure it to push its logs into our S3 CloudTrail bucket. 

Under services, select the CloudTrail service. Once in the CloudTrail console, click the Create Trail button. Give the new trail a name. Here we're gong to call ours CA CloudTrail Logs Demo. Leave all defaults as is until we get to the storage location section. Disable the create new S3 bucket option, and instead, select the name of the S3 CloudTrail bucket that we built earlier. Next, under Advanced, disable the enable log file validation option. This is unnecessary for this demo. Finally, click the Create button at the bottom of the screen. If all goes well, we should see fairly quickly our new trail has been provisioned successfully, as we do now. 

Let's switch over into the S3 console, and check to see if our newly created trail is publishing events into our bucket. Clicking on our CloudTrail configured bucket and drilling down into the lowest folder, we can see that we are indeed receiving logs from CloudTrail. This is great. Let's go back into the Athena console and perform a couple of queries against this data. Clicking on the preview icon in the right of our CloudTrail table, kicks off the sample query for us again. And we're now successfully seeing some early results coming through. Next, we'll flip across to the Visual Code and copy a pre configured SQL select statement. Back within the query editor, we paste in the select statement. Before we execute, let's click the Format Query button and have the editor reformat the query for us. This is a great feature that aids the readability of any SQL statements that we craft by hand. 

Okay, running the formatted query still returns just four rows of data. This implies we're still waiting for more CloudTrail logs to be delivered into our S3 bucket. Okay, now that we have all of the individual parts wired up successfully, let's try out the following scenario. We'll create a new example only security group within the EC2 service. The security group itself won't be attached to anything. We're creating it only to generate and capture the associated API calls within CloudTrail, for which, we will eventually query for within Athena. We'll add in some inbound rules on this new security group. Performing these actions will generate CloudTrail data that will be published into our CloudTrail S3 bucket. The end result being that we should be able to query and discover these actions within Athena. Right, let's start by heading over to the EC2 console. Click into the Security Group section, and then click the Create Security Group button. Give the security group a name. Here we name ours DemoSG. Don't worry about setting the VPC. In the inbound rule section we'll add a couple of rules. Clicking the Add Rule button, we add the first rule, allowing incoming traffic from source IP address 1.1.1.1/32 and to port 1000. Add a second rule. This time to allow incoming traffic from source IP address 2.2.2.2/32 and to port 2000, and then click the Create button. Next, we need to take a copy of the security group ID for the security group we just created. We'll use this within our Athena query. 

Jump back into the Athena query editor and update our like clause referencing the security group ID we just copied. This now tells Athena to search for all records who your request parameters attribute contains the security group ID we pasted into the like clause. Okay, let's now execute this query and see if we get any results. As you can see, no results have come back, likely due to relatency involved in CloudTrail receiving, processing, and saving out to the S3 bucket. Let's try again at approximately five minutes time. Okay, running the query again now provides us with results. As you can see, there are six rows in our output. Scrolling across the fourth row until we see the request parameters column. Here we can see two of the inbound rules we attached to the security group earlier. The first inbound rule allows incoming traffic to port 1000 from source IP address of 1.1.1.1/32. And the second inbound rule allowing incoming traffic to port 2000 from source IP address of 2.2.2.2/32. 

Okay, let's now expand our query by adding an additional clause. This time we'll filter out all events except for the authorize security group ingress event. Running this query now gives us back just the one row, as expected. Okay, let's now take a quick look at some of the other useful features within the Athena console. Each query that you author in the editing will be saved and replayed at a later stage. So I'll save our current query. Click the Save As button. Give the saved query a name and description. In this example, we'll call ours Important SG Query for both name and description. Click the Save button, and our query is saved and accessible in the saved queries area. Clicking on the Saved Queries top menu item shows us all of the previously saved queries, including our just saved Important SG Query at the bottom of the list. If we click on the query, it will be recalled back into the editor pane as can now be seen. 

Next, let's look at the history feature. This allows us to examine all past executed queries. Here we can see the most recent query at the top of the list. This was our last query that we ran, where we added the extra and the clause to filter on the event name column. If we click this query, once again, it's recalled to the editor pane. But additionally, it also shows us the results that were returned at the time the query was actually executed. Bonus points. Going back into the history feature, I'll now highlight a couple of the other important attributes for each captured query. Firstly, each query has a state associated with it. Here our query succeeded. If it hadn't, it would track as an error. Next, there is a time the query took to run captured in seconds. This is useful for performance tuning and troubleshooting. Then there is the amount of data scanned recorded in kilobytes. This is useful to understand how much each query is going to cost you. Finally, there is a download results link that allows you to get a local copy of the results. Clicking the link for this row downloads the results locally. We'll now use our local terminal to output the contents of the file to the screen. 

Again, we can see that the details of the two inbound rules we attached to the security group in question. The first inbound allowed incoming traffic to port 1000 from source IP address of 1.1.1.1/32. And the second inbound allowed incoming traffic to port 2000 from source IP address of 2.2.2.2/32. This concludes the demo. But before we finish, let's quickly go through the process of doing some clean up within Athena. Firstly, we'll drop our CloudTrail table. Back within the query editor we type the statement, drop table CloudTrail Logs. Running this query will drop our table, allowing us to then drop the database. Next, clear the editor and type the statement, drop database CloudTrail DB1. And execute it. This will drop our custom Athena database. Don't forget to delete the CloudTrail trail and remove the S3 buckets as used in this demo.

End of demonstration

That now brings me to the end of this lecture. Coming up next, I will summarize the key points taken from the previous lectures of this course.

About the Author

Students55517
Labs1
Courses55
Learning paths36

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data centre and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 50+ courses relating to Cloud, most within the AWS category with a heavy focus on security and compliance

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.