1. Home
  2. Training Library
  3. Amazon Web Services
  4. Amazon Web Services Courses
  5. Advanced Techniques for AWS Monitoring, Metrics and Logging

ELK Stack


PREVIEW10m 48s
Log Concepts
Events Everywhere
PREVIEW14m 47s
Practical Applications
ELK Stack
17m 57s
Start course
1h 10m

Modern AWS cloud deployments are increasingly distributed systems, comprising of many different components and services interacting with each other to deliver software. In order to ensure quality delivery, companies and DevOps teams need more sophisticated methods of monitoring their clouds, collecting operational metrics, and logging system occurrences.

This course aims to teach advanced techniques for logging on AWS, going beyond the basic uses of CloudWatch Metrics, CloudWatch Logs, and health monitoring systems. Students of this course will learn:

  • How to use logs as first-class building blocks
  • How to move away from thinking of logs as files
  • Treat monitoring, metrics, and log data as events
  • Reason about using streams as log transport buffers
  • How CloudWatch Log Groups are structured internally
  • To build an ELK stack for log aggregation and analysis
  • Build Slack ChatOps systems

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.


Welcome back to CloudAcademy's course on Advanced Amazon Web Services Monitoring Metrics and Logging. In this lecture, we're going to be trying out the ELK stack which was one of our logs stream sinks that we had talked about. It's a log aggregation service that let's us aggregate across multiple CloudWatch log streams, any kind of events that we want to index into the system. It integrates nicely with the Amazon Web Services stack with the introduction recently of the AWS Elasticsearch service which runs the E in that service and also the K. And the Lambda ends up being our L for the indexing of the log.

Without further delay, let's get started. So just a reminder of what we're trying to build. If we look really quickly at this flow chart that I'm working with here, I'm using a flow chart software that we can take peeks at so we can make these diagrams. But effectively, I've already got some of this built out here. We're going to be simulating a little API, but the main important piece here is that we'll be emitting events into CloudWatch logs, pulling logs out, feeding them into a Lambda which then subsequently inserts them to this Elasticsearch service that's actually running the E and the K in our ELK stack. Then we'll go through this log inspector cycle here where we try out that fast, searchable unified graphical logs UI.

So I need to navigate over to my CloudWatch console first and we need to first see what these log events might look like. I'm actually going to use AWS Lambda to create some logs just because Lambda integrates with CloudWatch logs really nicely. It's important to note that anything that can submit logs into CloudWatch logs will work for this part. This just happens to be lightweight, easy to test thing during a screen share.

Here we're looking at some function code inside of a Lambda. It doesn't really matter. The idea here though is that I'm running a test event where I've got some logs coming out of this output. Those might be a little small on your screen, but these logs should be visible now if we go look into CloudWatch logs. So I'm actually going to open these logs up. If we look at the raw log stream here, we can see that we have these function-loaded portions here and we can see that we have some events coming through. That's all well and good.

Let's go back to the Lambda console and rather than echoing, I'm now going to configure my test event and run a ping. We should see it come back with Pong here and if I go back to my log streams... ...we can see that last execution that I did for the echo and eventually that pong will show up. There's a little bit of latency there. So I won't bore us to death waiting for it.

Here we're navigating inside of the CloudWatch log group. These are interface here. We can navigate into individual streams which we were talking about in our previous lecture. Now in general this is helpful, but it's not optimal because I might want also to be able to search for events at the log group level instead of just the log stream which as you can see all I can do is search by stream prefix here and I can only search by text once I navigate into this user interface for the individual streams.

Without further ado, let's set up an ELK stacks. So you'd want to navigate over to the Elasticsearch service somehow if you're looking in the all AWS Services, it's alphabetical. So we could just click up here. And once we click on our Elasticsearch services, we'll navigate to this tab. I've already set this up so we can accelerate the pace of this video, but I'll show you what it takes to set one of these domains up since it takes about 10 minutes usually.

So this is exactly how I set up that other domain. I just hit yes. All of these default values work. I'm just selecting size and the count of the instances that will join that Elasticsearch cluster. I'll hit next. I would use open access to the domain for this demonstration just so we can easily see what's going on. I hit next and then I would hit confirm and create. I actually did this step already to accelerate the demonstration here. So we actually have a complete cluster already here.

So if I click on that Kibana 4 user interface, we should see a loading screen like this. So for the first time that you do that load, it will take a while and you'll see that little loading screen. We need to configure an index pattern. So we have index contains...  we want index contains time event. We're actually not going to do this until I go and create some more log events that will be submitted into my system. Again, I'm going to run another test with my ping and my pong. May be able to see more events going up here. Not yet. Okay. So even though I didn't get any events showing up from that one yet, what we can do is move back to the log groups. I actually want CloudAcademyDynamoLambda which I'll copy from from up here. I can use that identifier to search for it. Then click on the radio button here and stream into Elasticsearch service.

So I only have one cluster left, I can create a new role to allow the Lambda that I'm pushing into to publish into Elasticsearch. So I'm granting it that ability to post into my cluster. So I hit allow. I want to move forward. Realize that my log format will be coming through in the AWS Lambda format. We can see that we have some sample events coming up here. Then after I've done all of my configurations, this filter pattern should be familiar if you were watching very carefully here for what my log formats look like on this system, and then I can start streaming.

So what this is doing is it's setting up this portion here where we have this arrow going into a Lambda and this arrow going into the Elasticsearch service. So we just set that up. We'll wait just a second to allow that to activate. We're provided with the link to Kibana 4 but we already have it open. Another best thing that I can do now is to configure test events to actually run these operations and we're actually going to demonstrate what happens if we have a broken Dynamo configuration here. So I can do something like setting the received event here. I'm actually going to use a slightly different formatting scheme here, and allow my event to be logged in JSON formatting like that. So if I just log my event, then I can save and test. I should see operation ping and message "Hello World!" since I just added the console.log of my JSON and now I can see any invocation that comes into this Lambda in my log output. Now this log output will also show up if look at my log stream over here. So I've navigated back to my CloudWatch logs group and I can see that I've already got the JSON file or JSON line showing up inside here.

Now when we configured our ELK stack streaming service here when we did that. I'm going to reload so that we can see if those indices start showing up here and what we can do here is go directly to our cluster. We can run a search and see that we started having some events showing up here. So we've got some Kibana 4 indices. We've got some data showing up in the cluster now at the search engine end point. And if we scroll around here. We can see that my index that it set up is CloudWatchLogs- with the date that it occurs, with the type of the log stream names. Given that, we now know how we need to configure our index setting.

So I'm actually going to set this to cwl for CloudWatch Logs and set it to...use a time stamp field name of @timestamp. So this is the correct configuration for when we're doing CloudWatch log segregation in Kibana. So once I hit create, I should see all of my metadata start showing up since we already acquired some events from running before, and that's actually sufficient for us to begin using the discovery module here where I can see I start seeing all of my events that came through. Now, this is not particularly useful until we actually start creating some more logs. So let's create an error and see what happens.

So I'm going to create an error by...if we can see my default when I switch cases if I select an operation that is unknown then I can get it to throw an error. So let's do that. Going to set it up for an operation of many question marks, hit save and test, run that a couple times. We can see some very angry execution results with JSON representations of an error as well as the input value itself.

So if we think about this, this could be if somebody fuzzes your API and sends you some input that you don't understand. Perhaps your API doesn't support unicode. Rather than an intentional error like I've created here, this could easily be a 500 error or a 404 or some other problem that might organically occur where you want your API if this was a stream that's was being logged to standard out even on an EC2 instance. Then we would also want that to show up on this end.

What happens if I just search? We can see a whole bunch more events because I ran the test a couple more times. We can see my older event that I ran that ping up here and then I sent some more events where I have question marks. We can see the incoming event itself and we can see some error message events. So, say for instance, I want to know how many times there were unrecognized operations. Then I can see that there were three events during which there were unrecognized operations. We can also see the frequency with which they occurred. They occurred three times very frequently there. So this is excellent if you're trying to find specific values. Now you could also use this if you're trying to debug specific customers.

So again, this is contrived, but say we also set a customer ID field and set it one through zero, and search for customer ID. So we can see that we have this customer ID string and once I do the search for it after I allowed some time for those events to propagate, we can see that whenever I want my customer ID to show up, we can search for just the customer ID and see these events show up. I can also go back and expand by different properties, see the messages that are coming through etc, etc. So this is what a log aggregation system look like.

Now, we can also do our log analytics. If I want to visualize, for instance, an area chart from a new search, if I want my X-axis to be a histogram or a date histogram on time stamp over an automatic interval and add a sub-aggregation where I could split the area and I want to go by terms inside of the field for say on event or it can do an application. We could see the different frequencies of terms as we look inside of a system. So I could see where different numbers or terms showed up, and I can actually further refine this and say "Okay, I only want to see a histogram of these events."

So we can do all kinds of things like graphing on different fields and just generally perform magic. You can also add visualizations to different dashboards. You can set up any number of different things that you want based on these sub-aggregations. And of course, we have this excellent capability of doing searches.

There are commercial products that you can use that do the similar behaviors here, but we're a big fan of open source and being able to do value-added automation on top of our system. So we can see this big, nice log stream here. We have the original streams and sources that it came from, the AWS account associated with it, and we can do free text search. So again, we can search for specific error types if we so please.

So this kind of thing is very helpful for if you're trying to do a debugging session if a customer's complaining. Then, we of course want to be able to see what the customer's talking about by, for instance, searching for the error code that was dumped onto the page if they're on the phone and you're trying to do some kind of support with them. You can use this for any number of things. It's very helpful and it was very easy because all I had to do was create one of these domains, run through, create some sort of events so that they show up in CloudWatch logs, go to the log group, and check that streaming box and send it to Elasticsearch granting the role to the Lambda to allow the post. So that's it for ELK stack demonstration.

I hope to see you soon on the next lecture in which we'll be doing a little bit of chatOps in which we do another trivial solution where I make something pop up in my slack channel whenever a certain event occurs in my Amazon account.

About the Author

Nothing gets me more excited than the AWS Cloud platform! Teaching cloud skills has become a passion of mine. I have been a software and AWS cloud consultant for several years. I hold all 5 possible AWS Certifications: Developer Associate, SysOps Administrator Associate, Solutions Architect Associate, Solutions Architect Professional, and DevOps Engineer Professional. I live in Austin, Texas, USA, and work as development lead at my consulting firm, Tuple Labs.