Amazon Athena - In-depth Review
The course is part of this learning path
In this demonstration we’ll walk you through the steps required to get a fully functioning Athena demo up and running. The demo will involve 3 main steps. We setup CloudTrail to publish event logs into an S3 bucket. We then make some ad-hoc security group changes to generate a few CloudTrail events . Finally we’ll use Athena to search and find the captured security group API update calls.
We'll now move onto performing an Athena demonstration. In this demo, we'll walk through the steps required to set up Athena to allow us to query CloudTrail log data. This demo will involved CloudTrail, S3, EC2 and the Athena Services. The end result will allow us to perform SQL queries against CloudTrail data stored in an S3 bucket. This type of setup will aid your devops and sysops experience when building on top of the AWS platform. Let's get started. Within the AWS console, select the Athena service. The first step involves us creating a new Athena database which will host our custom data table created later in this demo. This task can be accomplished either through an Athena-provided wizard or manually through the query editor using an appropriate SQL Create statement. We'll perform this by using SQL statements. But for learning purposes, I'll show you where the wizard resides within the Athena console. Let's see where this is located. Clicking the catalog manager, menu item in the Athena menu bar takes us into the catalog manager. Initially, we have only the default Athena database to work with. We can click the Add Table action. This opens the Add Table wizard. The first step requires us to either choose an existing database or create a new database. In this case, I'm going with the Create New Database option. I'll need to provide the name for the new database. In this case, I'm going with NewDB. And then I need to provide the name for the new table. I'll go with NewTable. If I were to complete this wizard, I would additionally need to specify the S3 bucket location of our input set. However, I'm now going to cancel out of this wizard and perform the same process manually using SQL statements within the Query Editor. Clicking back on the Query Editor menu item, we're taken back into the Query Editor. Clicking within the Editor itself clears the area. I'm now going to switch over to my Visual Code Editor that has some pre-based SQL queries.
I'm taking a copy here of a Create Database statement. Flipping back to the Query Editor, I paste the Create Database statement into the editor pane. As you can see, this particular Create Database statement will create us a new database named cloudtrail-db1. The Create Database statement in this example specifies an S3 bucket location. This bucket will be used to store the new database's catalog. Let's go and create this S3 bucket now, opening the S3 console in a separate browser tab, clicking, the Create Bucket button. In the resulting Create Bucket window, we paste in our bucket name that our Create Database statement references. We then hit the Create button at the bottom of the window. This creates our S3 bucket that will store the database catalog. Okay, great, our bucket has been created successfully as shown here. Next, we'll go back to the Athena Query Editor. The Create Database statement can be configured to execute with metadata that may be relevant to your cause. In this example, I'm simply going to set the creator as myself, the company to be CloudAcademy and created to the current year. With all this in place, we can now go ahead and execute the create database statement. We do so by clicking the Run Query button. Let's do this now. We now wait for the query to complete. As we can now see, the Create Database statement has executed successfully as per the query successful response in the results section.
Additionally, we can see that we have our new database now displaying in the left-hand side menu. We'll now create our first table within our new database. Let's flip over to Visual code and take a copy of our prebuilt Table Creation statement. Back within the query editor, we paste in the table creation statement. Here you can see that our new table will be named cloudtrail_logs. The Create Table statement specifies all the relevant column attributes that CloudTrail tracks per order record. Next, we highlight the serializer, deserializer, or Serde in short form, that is used. In this case, we're using the Amazon-provided CloudTrail serde. Finally, I draw your attention to the S3 bucket location that needs to be provided. This particular S3 URL represents the location where our raw CloudTrail logs will reside once configured. Let's go ahead and create this bucket. Switching back over into the S3 console, click the Create Bucket button. In the resulting Create Bucket window, we paste in our bucket name that our Create Table statement will reference.
In our case, we'll create our new bucket with the name ca-cloudtrail-logs-demo. We then hit the Create button at the bottom of the window. This creates our S3 bucket. But CloudTrail will be next to configure to push logs into, and for which Athena will scan for when executing our SQL queries. Okay, great, our bucket has been successfully created as can be seen here. Switching back into the query editor window, we now paste in the S3 bucket name we just created. Okay, everything looks ready. Let's now click on the Run Query button. And, again, our query has executed successfully. And in this case, our new CloudTrail logs table has been created. On the left-hand side, we see our newly created table listed. Clicking on the preview icon, to the right of the table name, executes the sample query now shown in the editor pane. This query will perform a select all across the table, but limited to the first 10 rows. Since our CloudTrail bucket has yet to be populated, it's expected that the query will return an empty results set as it does. Next, if we expand the table name itself, we see the column names and types that define it. Finally, clicking on the Table Properties icon, we're presented with a view of all the respected table properties associated with our new table. Important properties include table name, database name, S3 bucket location, and serialization library. Let's now go and establish a new CloudTrail trail and configure it to push its logs into our S3 CloudTrail bucket. Under Services, select the CloudTrail Service. Once in the CloudTrail console, click the Create Trail button. Give the new trail a name.
Here, we're going to call ours ca-cloudtrail-logs-demo. Leave all defaults as is until we get to the Storage Location section. Disable the Create new S3 Bucket option and instead select the name of the S3 CloudTrail bucket that we built earlier. Next, under Advanced, disable the Enable Log File Validation option. This is unnecessary for this demo. Finally, click the Create button at the bottom of the screen. If all goes well, we should see fairly quickly our new trail has been provisioned successfully as we do now. Let's switch over into the S3 console and check to see if our newly created trail is publishing events into our bucket. Clicking on our CloudTrail configured bucket and drilling down into the lowest folder, we can see that we are indeed receiving logs from CloudTrail. This is great. Let's go back into the Athena console and perform a couple of queries against this data. Clicking on the preview icon in the right of our CloudTrail Table kicks off the sample query for us again. And we're now successfully seeing some earlier results coming through. Next, we'll flip across to visual code and copy our pre-configured SQL select statement. Back within the Query Editor, we paste in the select statement. Before we execute, let's click the Format Query button and have the editor reformat that query for us. This is a great feature that aids the readability of any SQL statements that we craft by hand. Okay, running the formatted query still returns just four rows of data. This implies we're still waiting for more CloudTrail logs to be delivered into our S3 bucket. Okay, now that we have all the individual parts wired up successfully, let's try out the following scenario.
We'll create a new example-only security group within the EC2 service. The security group itself won't be attached to anything. We're creating it only to generate and capture the associated API calls within CloudTrail, for which we will eventually query for within Athena. We'll add in some inbound rules in this new security group. Performing these actions will generate CloudTrail data that will be published into our CloudTrail S3 bucket. The end result being that we should be able to query and discover these actions within Athena. Right, let's start by heading over to the EC2 console. Click into the Security Group section. And then click the Create Security Group button. Give the security group a name. Here we name our DemosSG. Don't worry about setting the VPC. In the Inbound Rules section, we'll add a couple of rules. Clicking the Add Rule button, we add the first rule, allowing incoming traffic from source IP address, 18.104.22.168/32 and to port 1000. Add a second row, this time to allow incoming traffic from source IP address, 22.214.171.124/32 and to port 2000. And then click the Create button. Next, we need to take a copy of the security group ID for the security group we just created. We'll use this within our Athena query. Jump back into the Athena Query editor and update our like clause, referencing the security group ID we just copied. This now tells Athena to search for all records whose request parameters attribute contains the security group ID. We paste it into the like clause. Okay, let's now execute this query and see if we get any results. As you can see no results have come back. Likely due to a latency involved in CloudTrail receiving, processing and saving out to the S3 bucket. Let's try again in approximately five minutes time. Okay, running the query again now provides us with results. As you can see there are six rows in our output. Scrolling across the fourth row until we see the Request Parameters column. Here, we can see two of the inbound rules we attached to the security group earlier. The first inbound rule allows incoming traffic to port 1000 from source IP address of 126.96.36.199/32. And the second inbound rule allowing incoming traffic to port 2000 from source IP address of 188.8.131.52/32. Okay, let's now expand our query by adding an additional clause. This time, we'll filter out all events except for the authorized security group ingress event. Running this query now gives us back just the one row as expected. Okay, let's now take a quick look at some of the other useful features within the Athena console. Each query that you author in the editor are gonna be saved and replayed at a later stage. So I'll save our current query, click the Save As button.
Give the saved query a name and description. In this example, we'll call our ImportantSGQuery, both name and description. Click the Save button and our query is saved and accessible in the Saved Queries area. Clicking the Saved Queries top menu item shows us all of the previously saved queries, including our just saved ImportantSGQuery at the bottom of the list. If we click on the query, it will be recalled back into the editor pane as can now be seen. Next, let's look at the history feature. This allows us to examine all past executed queries. Here we can see the most recent query at the top of the list. This was our last query that we ran where we added the extra and clause to filter on the Event Name column. If we click this query, once again, it's recalled to the Add To pane. But, additionally, it also shows us the results that were returned at the time the query was actually executed, bonus points. Going back into the history feature, I'll now highlight a couple of the other important attributes for each captured query. Firstly, each query has a state associated with it. Here, our query succeeded. If it hadn't, it would track as an error. Next, there is time the query took to run captured in seconds. This is useful for performance tuning and troubleshooting. Then there is the amount of data scan recorded in kilobytes.
This is useful to understand how much each query is going to cost you. Finally, there is a Download results link that allows you to get a local copy of the results. Clicking the link for this row downloads the results locally. We'll now use our local terminal to output the contents of the file to the screen. Again, we can see the details of the two inbound rules we attached to the security group in question. The first inbound allowed incoming traffic to Port 1000 from source IP address of 184.108.40.206/32. And the second inbound allowed incoming traffic to Port 2000 from source IP address of 220.127.116.11/32. This concludes the demo. But before we finish, let's quickly go through the process of doing some cleanup within Athena. Firstly, we'll drop our CloudTrail table. Back within the Query Editor, we type the statement DROP TABLE cloudtrail_logs. Running this query will drop our table allowing us to then drop the database. Next, clear the editor. And type the statement DROP DATABASE cloudtraildb1 and execute it. This will drop our custom Athena database. Both get to delete the CloudTrail trail and remove the S3 buckets as used in this demo.
About the Author
Jeremy is the DevOps Content Lead at Cloud Academy where he specializes in developing technical training documentation for DevOps.
He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 20+ years. In recent times, Jeremy has been focused on DevOps, Cloud, Security, and Machine Learning.
Jeremy holds professional certifications for both the AWS and GCP cloud platforms.