image
Filter and Query data with Amazon Athena

Contents

Course Introduction
1
Introduction
PREVIEW2m 22s
Amazon CloudWatch
AWS Config
8
What is AWS Config?
PREVIEW6m 14s
AWS Control Tower
13
AWS Control Tower
PREVIEW19m 56s
AWS Resource Access Manager
AWS Management
16
AWS Service Catalog
PREVIEW10m 34s
AWS Trusted Advisor Best Practices
AWS Health Dashboard
AWS Data Visualization
Finding Compliance Data with AWS Artifact
Observability in AWS
46
Start course
Difficulty
Intermediate
Duration
6h 2m
Students
387
Ratings
4.2/5
Description

This section of the AWS Certified Solutions Architect - Professional learning path introduces the AWS management and governance services relevant to the AWS Certified Solutions Architect - Professional exam. These services are used to help you audit, monitor, and evaluate your AWS infrastructure and resources and form a core component of resilient and performant architectures. 

Want more? Try a Lab Playground or do a Lab Challenge!

Learning Objectives

  • Understand the benefits of using AWS CloudWatch and audit logs to manage your infrastructure
  • Learn how to record and track API requests using AWS CloudTrail
  • Learn what AWS Config is and its components
  • Manage multi-account environments with AWS Organizations and Control Tower
  • Learn how to carry out logging with CloudWatch, CloudTrail, CloudFront, and VPC Flow Logs
  • Learn about AWS data transformation tools such as AWS Glue and data visualization services like Amazon Athena and QuickSight
  • Learn how AWS CloudFormation can be used to represent your infrastructure as code (IaC)
  • Understand SLAs in AWS
Transcript

Transcript

Hello and welcome to this lecture. We will learn how to analyze and search for specific data across log files being stored in S3. If your log data is being stored in S3, such as your CloudTrail logs, then you can use Amazon Athena to query that data within S3 to search for specific entries. The following demonstration has been created by Jeremy Cook, one of our AWS expert trainers here at Cloud Academy. In this demonstration, Jeremy will walk through the steps required to set up Athena to allow you to query CloudTrail log data.

Start of demonstration

In this demo we'll walk through the steps required to set up Athena to allow us to query CloudTrail log data. This demo will involve configuring CloudTrail, S3, EC2, and the Athena services. The end result will allow us to perform SQL queries against CloudTrail data stored in an S3 bucket. This type of setup will aid your DevOps and SecOps experience when building on top of the AWS platform

Let's get started. Within the AWS console, select the Athena service. The first step involves us creating a new Athena database, which will host our custom data table created later in this demo. This task can be accomplished either through an Athena provided wizard or manually through the query editor using an appropriate SQL create statement. We'll perform this by using SQL statements. But for learning purposes, I will show you where the wizard resides within the Athena console. Let's see where this is located. Clicking the catalog manager, menu item in the Athena menu bar, takes us into the catalog manager. Initially, we have only the default Athena database to work with. We can click the Add Table action. This opens the Add Table Wizard. The first step requires us to either choose an existing database or create a new database. In this case, I'm going with the create new database option. I need to provide the name for the new database. In this case, I'm going with New DB. And then I need to provide the name for the new table. I'll go with New Table. If I were to complete this wizard, I would additionally need to specify the S3 bucket location of our input set. However, I'm now going to cancel out of this wizard and perform the same process manually using SQL statements within the query editor. 

Clicking back on the query editor menu item, we're taken back into the query editor. Clicking within the editor itself clears the area. I'm now going to switch over to my visual code editor that has some pre baked SQL queries. I'm taking a copy here of a create database statement. Flipping back to the query editor, I paste the create database statement into the editor pane. As you can see, this particular create database statement will create as a new database named CloudTrail DB1. The create database statement, in this example, specifies an S3 bucket location. This bucket will be used to store the new database's catalog. Let's go and create this S3 bucket now, opening the S3 console in a separate browser tab. Clicking the Create Bucket button. In the resulting create bucket window, we paste in our bucket name that our create database statement references. We then hit the Create button at the bottom of the window. This creates our S3 bucket that will store the database catalog. 

Okay, great. Our bucket has been created successfully as shown here. Next, we'll go back to the Athena query editor. The create database statement can be configured to execute with metadata that may be relevant to your cause. In this example, I'm simply going to set the creator as myself, the company to be Cloud Academy, and created to the current year. With all this in place, we can now go ahead and execute the create database statement. We do so by clicking the Run Query button. Let's do this now. We now wait for the query to complete. As we can now see, the create database statement has executed successfully, as per the query successful response in the result section. Additionally, we can see that we have our new database now displaying in the left hand side menu. We'll now create our first table within our new database. 

Let's flip over to Visual Code and take a copy of a pre built table creation statement. Back within the query editor, we paste in the table creation statement. Here you can see that our new table will be named CloudTrail logs. The create table statement specifies all the relevant column attributes that CloudTrail tracks per order record. Next, we highlight the serializer-deserializer, or SERDE in short form, that is used. In this case, we're using the Amazon Provided CloudTrail SERDE. Finally, I draw your attention to the S3 bucket location that needs to be provided. This particular S3 URL represents the location where our raw CloudTrail logs will reside once configured. Let's go ahead and create this bucket. Switching back over into the S3 console, click the Create Bucket button. In the resulting Create Bucket window, we paste in our bucket name that our create table statement will reference. In our case, we'll create our new bucket with the name CA CloudTrail Logs Demo. We then hit the Create button at the bottom of the window. This creates our S3 bucket that CloudTrail will be next to configure to push logs into, and for which Athena will scan from when executing our SQL queries. 

Okay, great. Our bucket has been successfully created, as can be seen here. Switching back into the query editor window, we now paste in the S3 bucket name we just created. Okay, everything looks ready. Let's now click on the Run Query button. And again, our query has executed successfully. And in this case, our new CloudTrail logs table has been created. On the left hand side, we see our newly created table listed. Clicking on the preview icon to the right of the table name executes the sample query now shown in the editor pane. This query will perform a Select All across the table, but limited to the first 10 rows. Since our CloudTrail bucket has yet to be populated, it's expected that the query will return an empty result set as it does. Next, if we expand the table name itself, we see the column names and types that define it. Finally, clicking on the table properties icon, we are presented with a view of all the respected table properties associated with our new table. Important properties include, table name, database name, S3 bucket location, and serialization library. Let's now go and establish a new CloudTrail trail and configure it to push its logs into our S3 CloudTrail bucket. 

Under services, select the CloudTrail service. Once in the CloudTrail console, click the Create Trail button. Give the new trail a name. Here we're gong to call ours CA CloudTrail Logs Demo. Leave all defaults as is until we get to the storage location section. Disable the create new S3 bucket option, and instead, select the name of the S3 CloudTrail bucket that we built earlier. Next, under Advanced, disable the enable log file validation option. This is unnecessary for this demo. Finally, click the Create button at the bottom of the screen. If all goes well, we should see fairly quickly our new trail has been provisioned successfully, as we do now. 

Let's switch over into the S3 console, and check to see if our newly created trail is publishing events into our bucket. Clicking on our CloudTrail configured bucket and drilling down into the lowest folder, we can see that we are indeed receiving logs from CloudTrail. This is great. Let's go back into the Athena console and perform a couple of queries against this data. Clicking on the preview icon in the right of our CloudTrail table, kicks off the sample query for us again. And we're now successfully seeing some early results coming through. Next, we'll flip across to the Visual Code and copy a pre configured SQL select statement. Back within the query editor, we paste in the select statement. Before we execute, let's click the Format Query button and have the editor reformat the query for us. This is a great feature that aids the readability of any SQL statements that we craft by hand. 

Okay, running the formatted query still returns just four rows of data. This implies we're still waiting for more CloudTrail logs to be delivered into our S3 bucket. Okay, now that we have all of the individual parts wired up successfully, let's try out the following scenario. We'll create a new example only security group within the EC2 service. The security group itself won't be attached to anything. We're creating it only to generate and capture the associated API calls within CloudTrail, for which, we will eventually query for within Athena. We'll add in some inbound rules on this new security group. Performing these actions will generate CloudTrail data that will be published into our CloudTrail S3 bucket. The end result being that we should be able to query and discover these actions within Athena. Right, let's start by heading over to the EC2 console. Click into the Security Group section, and then click the Create Security Group button. Give the security group a name. Here we name ours DemoSG. Don't worry about setting the VPC. In the inbound rule section we'll add a couple of rules. Clicking the Add Rule button, we add the first rule, allowing incoming traffic from source IP address 1.1.1.1/32 and to port 1000. Add a second rule. This time to allow incoming traffic from source IP address 2.2.2.2/32 and to port 2000, and then click the Create button. Next, we need to take a copy of the security group ID for the security group we just created. We'll use this within our Athena query. 

Jump back into the Athena query editor and update our like clause referencing the security group ID we just copied. This now tells Athena to search for all records who your request parameters attribute contains the security group ID we pasted into the like clause. Okay, let's now execute this query and see if we get any results. As you can see, no results have come back, likely due to relatency involved in CloudTrail receiving, processing, and saving out to the S3 bucket. Let's try again at approximately five minutes time. Okay, running the query again now provides us with results. As you can see, there are six rows in our output. Scrolling across the fourth row until we see the request parameters column. Here we can see two of the inbound rules we attached to the security group earlier. The first inbound rule allows incoming traffic to port 1000 from source IP address of 1.1.1.1/32. And the second inbound rule allowing incoming traffic to port 2000 from source IP address of 2.2.2.2/32. 

Okay, let's now expand our query by adding an additional clause. This time we'll filter out all events except for the authorize security group ingress event. Running this query now gives us back just the one row, as expected. Okay, let's now take a quick look at some of the other useful features within the Athena console. Each query that you author in the editing will be saved and replayed at a later stage. So I'll save our current query. Click the Save As button. Give the saved query a name and description. In this example, we'll call ours Important SG Query for both name and description. Click the Save button, and our query is saved and accessible in the saved queries area. Clicking on the Saved Queries top menu item shows us all of the previously saved queries, including our just saved Important SG Query at the bottom of the list. If we click on the query, it will be recalled back into the editor pane as can now be seen. 

Next, let's look at the history feature. This allows us to examine all past executed queries. Here we can see the most recent query at the top of the list. This was our last query that we ran, where we added the extra and the clause to filter on the event name column. If we click this query, once again, it's recalled to the editor pane. But additionally, it also shows us the results that were returned at the time the query was actually executed. Bonus points. Going back into the history feature, I'll now highlight a couple of the other important attributes for each captured query. Firstly, each query has a state associated with it. Here our query succeeded. If it hadn't, it would track as an error. Next, there is a time the query took to run captured in seconds. This is useful for performance tuning and troubleshooting. Then there is the amount of data scanned recorded in kilobytes. This is useful to understand how much each query is going to cost you. Finally, there is a download results link that allows you to get a local copy of the results. Clicking the link for this row downloads the results locally. We'll now use our local terminal to output the contents of the file to the screen. 

Again, we can see that the details of the two inbound rules we attached to the security group in question. The first inbound allowed incoming traffic to port 1000 from source IP address of 1.1.1.1/32. And the second inbound allowed incoming traffic to port 2000 from source IP address of 2.2.2.2/32. This concludes the demo. But before we finish, let's quickly go through the process of doing some clean up within Athena. Firstly, we'll drop our CloudTrail table. Back within the query editor we type the statement, drop table CloudTrail Logs. Running this query will drop our table, allowing us to then drop the database. Next, clear the editor and type the statement, drop database CloudTrail DB1. And execute it. This will drop our custom Athena database. Don't forget to delete the CloudTrail trail and remove the S3 buckets as used in this demo.

End of demonstration

That now brings me to the end of this lecture. Coming up next, I will summarize the key points taken from the previous lectures of this course.

About the Author
Students
61889
Courses
32
Learning Paths
30

Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.