This section provides detail on the AWS management services relevant to the Solution Architect Associate exam. These services are used to help you audit, monitor and evaluate your AWS infrastructure and resources. These management services form a core component of running resilient and performant architectures.
- Understand the benefits of using AWS CloudWatch and audit logs to manage your infrastructure
- Learn how to record and track API requests using AWS CloudTrail
- Learn what AWS Config is and its components
- Manage your accounts with AWS Organizations, including single sign-on with AWS SSO
- Learn how to carry out logging with CloudWatch, CloudTrail, CloudFront, and VPC Flow Logs
- Understand how to design cost-optimized architectures in AWS
- Learn about AWS data transformation tools such as AWS Glue and data visualization services like Amazon Athena and QuickSight
Hello and welcome to this lecture which will provide you with a high-level overview of what Amazon CloudWatch is and does.
Amazon CloudWatch is a global service that has been designed to be your window into the health and operational performance of your applications and infrastructure. It’s able to collate and present meaningful operational data from your resources allowing you to monitor and review their performance. This gives you the opportunity to take advantage of the insights that CloudWatch presents, which in turn can trigger automated responses or provide you with the opportunity and time to make manual operational changes and decisions to optimize your infrastructure if required.
Understanding the health and performance of your environment is one of the fundamental operations you can do to help you minimize incidents, outages and errors. As a result Amazon CloudWatch is heavily used by those in an operational role and site reliability engineers.
There are a wide range of components to Amazon CloudWatch, making this an extremely powerful service. Let me now run through at a high level some of these features and what they allow you to do, including CloudWatch Dashboards, CloudWatch Metrics and Anomaly Detection, CloudWatch Alarms, CloudWatch EventBridge, CloudWatch Logs, CloudWatch Insights.
Using the AWS Management console, the AWS CLI, or the PutDashboard API, you can build and customize a page using different visual widgets displaying metrics and alarms relating to your resources to form a unified view. These dashboards can then be viewed from within the AWS Management Console.
Here is an example of the different types of widgets you can select to build your dashboard.
The resources within your customized dashboard can be from multiple different regions making this a very useful feature. Being able to build your own views, you can quickly and easily design and configure different dashboards to represent the data that you need to see from a business and operational perspective. For example, you might need to view all performance metrics and alarms from resources relating to a particular project, or a specific customer. Or you might want to create a different dashboard for a specific region or application deployment. The key point is that they are fully customizable to be designed how YOU want to represent your data.
For more information of selecting the right chart type to visualize data, please see our existing course here: https://cloudacademy.com/course/data-visualization-how-to-convey-your-data-1112/
Once you have built your Dashboards, you can easily share them with other users, even those who may not have access to your AWS account. This allows you to share the findings gathered by CloudWatch with those who may find the results interesting and beneficial to their day-to-day operational role, but don’t necessarily require the need to access your AWS account.
Metrics are a key component and fundamental to the success of Amazon CloudWatch, they enable you to monitor a specific element of an application or resource over a period of time while tracking these data points. For example, the number of DiskReads or DiskWrites on an EC2 instance, these are just 2 metrics relating to EC2 that you can monitor. Different services will offer different metrics, for example, there is no DiskReads for Amazon S3 as it’s not a compute service, and so instead metrics relevant to the service are available, such as NumberOfObjects, which tracks the number of objects in a specified bucket.
By default when working with Amazon CloudWatch, everyone has access to a free set of Metrics, and for EC2, these are collated over a time period of 5 minutes. However, for a small fee, you can enable detailed monitoring which will allow you to gain a deeper insight by collating data across the metrics every minute. In addition to detailed monitoring, you can also create your own custom metrics for your applications, using any time-series data points that you need, but be aware that when you create a metric they are regional, meaning that any metrics created in 1 region will not be available in another.
CloudWatch metrics also allow you to enable a feature known as anomaly detection. This allows CloudWatch to implement machine learning algorithms against your metric data to help detect any activity that sits outside of the normal baseline parameters that are generally expected. Advance warning of this can help you detect an issue long before it becomes a production problem.
Amazon CloudWatch Alarms tightly integrate with Metrics that I just discussed and they allow you to implement automatic actions based on specific thresholds that you can configure relating to each metric.
For example, you could set an alarm to activate an auto scaling operation, such as provisioning another instance if your CPUUtilization of an EC2 instance peaked at 75% for more than 5 minutes. You could also configure an alarm to send a message to an SNS Topic when the same instance drops back below the 75% threshold, causing it to come out of an ‘alarm’ state notifying engineers of the change.
For more information on SNS, please see our existing course here: https://cloudacademy.com/course/using-sqs-sns-ses/
Speaking of Alarm states, there are 3 different states for any alarm associated with a metric, these being OK – The metric is within the defined configured threshold, ALARM – The metric has exceeded the thresholds set, and INSUFFICIENT_DATA – The alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.
CloudWatch alarms are also easily integrated with your dashboards as well, allowing you to quickly and easily visualize the status of each alarm. When an alarm is triggered into a state of ALARM, it will turn red on your dashboard, giving a very obvious indication.
CloudWatch EventBridge is a feature that has evolved from an existing feature called Amazon Events. So if you have any prior experience working with CloudWatch Events then this will be fairly familiar to you.
CloudWatch EventBridge provides a means of connecting your own applications to a variety of different targets, typically AWS services, to allow you to implement a level of real-time monitoring, allowing you to respond to events that occur in your application as they happen.
But what is an event? Basically, an event is anything that causes a change to your environment or application.
The big benefit of using CloudWatch EventBridge is that it offers you the opportunity to implement a level of event-driven architecture in a real-time decoupled environment.
EventBridge establishes a connection between your applications and specified targets to allow a data stream of events to be sent. Currently, there is a wide range of targets that can be used as a destination for events as you can see here.
For the latest list of targets, please see the relevant documentation here: https://docs.aws.amazon.com/eventbridge/latest/userguide/eventbridge-targets.html
Let me provide a quick level overview of some of the elements of this feature, and these include Rules, Targets, and Event Buses.
So starting with Rules. A rule acts as a filter for incoming streams of event traffic and then routes these events to the appropriate target defined within the rule. The rule itself can route traffic to multiple targets, however the target must be in the same region.
Next, we have Targets. We saw a list of these just a few moments ago, so targets and where the events are sent by the Rules, such as AWS Lambda, SQS, Kinesis or SNS. All events received by the target are done os in a JSON format
Now finally, Event Buses. An Event Bus is the component that actually receives the Event from your applications and your rules are associated with a specific event bus. CloudWatch EventBridge uses a default Event bus that is used to receive events from AWS services, however, you are able to create your own Event Bus to capture events from your own applications.
CloudWatch Logs gives you a centralized location to house all of your logs from different AWS services that provide logs as an output, such as CloudTrail, EC2, VPC Flow logs, etc, in addition to your own applications.
When log data is fed into Cloudwatch Logs you can utilize CloudWatch Log Insights to monitor the logstream in real time and configure filters to search for specific entries and actions that you need to be alerted on or respond to. This allows CloudWatch Logs to act as a central repository for real-time monitoring of log data.
An added advantage of CloudWatch logs comes with the installation of the Unified CloudWatch Agent, which can collect logs and additional metric data from EC2 instances as well from on-premise services running either a Linux or Windows operating system. This metric data is in addition to the default EC2 metrics that CloudWatch automatically configures for you. The list of these additional metrics collected by the agent can be found at this link here.
There are now 3 different types of insights within CloudWatch, there are Log Insights, Container Insights, and Lambda Insights.
But what exactly are insights? Well as the name suggests, they provide the ability to get more information from the data that CloudWatch is collecting. So let’s look at each of these at a high level to understand the role that they perform, starting with Log Insights.
This is a feature that can analyze your logs that are captured by CloudWatch Logs at scale in seconds using interactive queries delivering visualizations that can be represented as bar, line, pie, or stacked area charts. The versatility of this feature allows you to work with any log file formats that AWS services or your applications might be using.
Using a flexible approach, you can use Log insights to filter your log data to retrieve specific data allowing you to gather insights that you are interested in. Also using the visual capabilities of the feature, it can display them in a visual way.
Much like Log insights, Container Insights allow you to collate and group different metric data from different container services and applications within AWS, for example, the Amazon Elastic Kubernetes Service, (EKS) and the Elastic Container Service (ECS).
In addition to the standard metrics collected for these services by CloudWatch, Container Insights also allows you to capture and monitor diagnostic data giving you additional insights into how to resolve issues that arise within your container architecture. This monitoring and insight data can be analyzed at the cluster, node, pod, and task level making it a valuable tool to help you understand your container applications and services.
As you may have guessed by now, this feature provides you the opportunity to gain a deeper understanding of your applications using AWS Lambda. Working on the principles as we have seen with the previous 2 insight features, it gathers and aggregates system and diagnostic metrics related to AWS Lambda to help you monitor and troubleshoot your serverless applications.
To enable Lambda Insights, you need to enable the feature per Lambda function that you create within Monitoring Tools section of your function:
This ensures that a CloudWatch extension is enabled for your function allowing it to collate system-level metrics which are recorded every time the function is invoked.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.