The course is part of these learning pathsSee 3 more
Modern AWS cloud deployments are increasingly distributed systems, comprising of many different components and services interacting with each other to deliver software. In order to ensure quality delivery, companies and DevOps teams need more sophisticated methods of monitoring their clouds, collecting operational metrics, and logging system occurrences.
This course aims to teach advanced techniques for logging on AWS, going beyond the basic uses of CloudWatch Metrics, CloudWatch Logs, and health monitoring systems. Students of this course will learn:
- How to use logs as first-class building blocks
- How to move away from thinking of logs as files
- Treat monitoring, metrics, and log data as events
- Reason about using streams as log transport buffers
- How CloudWatch Log Groups are structured internally
- To build an ELK stack for log aggregation and analysis
- Build Slack ChatOps systems
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
Welcome back to Cloud Academy's course on Advanced Amazon Web Services - Monitoring Metrics and Logging. Today, we're going to look at a demo of a ChatOps system, which is a system whereby we post operational messages or alerts into a chat system. I'll be using Slack today, since I like their API. It's nice and easy, and I already have a domain that I could use. So without further ado, let's get started.
So the first thing that we need to realize is that we can reuse from the previous demo the same kind of logic that we were using to degenerate these unhandled errors here. So I'm going to continue using that same Lambda, simply because it's already pre-integrated with CloudWatch, and I don't have to do any additional work to get this thing to generate logs and submit them to CloudWatch. I wouldn't have to install the daemon like I do on an EC2 instance.
So looking at the function, all we need to do is give it an operation value that does not meet any of these cases, and we'll get this uncaught exception, which is what I did. I gave it this name "Fatal Operation". So how do we get the logs from this thing to show up in a Slack chat whenever there's an error or something? Well, it's relatively simple. I have to create a Lambda function that will post into Slack. So one way that I can do that is by creating one of these Slack integrations. So if I go to my custom integrations, I could configure an incoming Webhook. All you have to do is click Yes, and click Add Configuration. I have this set up to post into AWS, and then I copy my Slack Webhook URL, which I'm not going to show you because then you could post into my account. Then, once we realize that I have an endpoint that I can post into Slack with, I need to create a Lambda function that will actually let me post into Slack. So let's do "SlackChatOpsDemo2", then enter some code in that will handle the stream events that come out of CloudWatch.
So a couple things going on here, really only three. I receive the event, I un-Base64 the event, and I G-unZip the event. So CloudWatch Logs is going to try to save bandwidth by transporting all the events pre-GZipped and Base64-ed so they compress well. So all I'm doing here is decompressing and then parsing and re-stringifying in a nicer format for me to post into Slack. When I post into Slack, literally all it is is adding some marked down-style block code formatting, joining it with new lines, and then running an HTTPS request against my Slack, Hook, host and path.
So I need to then assign a role to the Lambda, which will allow it to both create new logs and read from log streams. So I'm going to create a new role policy and manually edit this one. There's no out-of-the-box role for us to use for this kind of ChatOps system. But I can just give Lambda full access to logs for now, allow it to have access to those full sets for the logs, then up my timeout to about five seconds so we don't have Slack issues, and create the function.
So the next thing I need to do is start streaming that data from a log group which corresponds to my Cloud Academy Dynamo Lambda, which is where I'm generating my error, so my Cloud Academy Dynamo Lambda. Stream that to AWS Lambda. If I can actually find my demonstration Lambda, that Lambda that we just created that does the G-unZipping and Base64 decoding, we want to strip it over there. Since our event generator again is this Cloud Academy Dynamo Lambda, we want to use the AWS Lambda format.
So we should now be streaming from our Dynamo Lambda. We'll automatically create log lines that go into CloudWatch Logs at the top of the stream. So this is an inserting or publishing function into a stream that is our log stream. Then, use this other Lambda, this Slack ChatOps Demo 2 as my consumer. So I set that subscription up when I went over and checked this box, and started subscribing to that log stream with this other Lambda. So now we should expect to be able to run a test, have the "Fatal Operation" fail, check the actual logs, see that we have some fatal operations, which we'll then subsequently post into Slack. So you can see that I have Slack showing up here. This is the un-Base64, un-GZipped message.
Again, we went from Lambda, which generates log event lines, sends them into CloudWatch Logs here. We then went to the CloudWatch Logs Group user interface, and went to our subscriptions and added a subscription filter to Lambda. This Lambda was the recipient Lambda, which has these events coming in. These events correspond to our CloudWatch Logs event data. The way that those are formatted and sent to us to save bandwidth are Base64 and GZip, so we had to undo those steps. We had to decode the Base64 and then G-unZip, and then simply publish to Slack via the API endpoint and, voila, we have our ChatOps system. We can see something terrible happened, and we have fatal operations, and now our entire team that's on our chat system should be able to see a message like this.
So hopefully, you enjoyed seeing the practical way to implement a very simple ChatOps system using totally serverless technologies, as well as CloudWatch log streams, and treating logs as a first-class citizen for insight and automation. ChatOps, which is one of these things, is a very simple way for us to alert people on our team when certain log events that are scary or frequent, or whatever other metric we want to us, publish into Slack and notify the entire team.
About the Author
Nothing gets me more excited than the AWS Cloud platform! Teaching cloud skills has become a passion of mine. I have been a software and AWS cloud consultant for several years. I hold all 5 possible AWS Certifications: Developer Associate, SysOps Administrator Associate, Solutions Architect Associate, Solutions Architect Professional, and DevOps Engineer Professional. I live in Austin, Texas, USA, and work as development lead at my consulting firm, Tuple Labs.