Observability of a System
Start course
6h 2m

This section of the AWS Certified Solutions Architect - Professional learning path introduces the AWS management and governance services relevant to the AWS Certified Solutions Architect - Professional exam. These services are used to help you audit, monitor, and evaluate your AWS infrastructure and resources and form a core component of resilient and performant architectures. 

Want more? Try a Lab Playground or do a Lab Challenge!

Learning Objectives

  • Understand the benefits of using AWS CloudWatch and audit logs to manage your infrastructure
  • Learn how to record and track API requests using AWS CloudTrail
  • Learn what AWS Config is and its components
  • Manage multi-account environments with AWS Organizations and Control Tower
  • Learn how to carry out logging with CloudWatch, CloudTrail, CloudFront, and VPC Flow Logs
  • Learn about AWS data transformation tools such as AWS Glue and data visualization services like Amazon Athena and QuickSight
  • Learn how AWS CloudFormation can be used to represent your infrastructure as code (IaC)
  • Understand SLAs in AWS

Let’s walk through an architecture, and get an idea of how some of the observability services in AWS can work together to help you solve a problem. Let’s take an example application: say a website that sells cat photos. This e-commerce site runs on an Amazon Elastic Container Service cluster on Amazon EC2 instances and it also has a series of Lambda functions reading from a DynamoDB table that updates the status of the top buyers of cat photos on your site. So you have the e-commerce side and you also have the leaderboard side of your application.

Let’s say you want to begin instrumenting this app with the basics of metrics, traces, and logs. 

For metrics and logs, you use CloudWatch and install the CloudWatch agent on your EC2 instances. CloudWatch provides out-of-the-box metric functionality for AWS services like EC2, ECS, Lambda, RDS, and DynamoDB. While the AWS-provided metrics are helpful, most customers need more visibility into their system, so you can also choose to create CloudWatch custom metrics to get the additional information you need from your app. For example, perhaps you’d want to collect data on page views for your cat photo site. That won’t come out of the box from CloudWatch, so you’d need to create a custom metric. 

This used to be important for container workloads. In the past, if you wanted to get service or task-level metrics for ECS, you’d have to create custom metrics.

However, now, you can use another service called Amazon container insights to get this information. This service enables you to more easily monitor Amazon ECS and Amazon EKS workloads. You can think of it as your “one stop shop” for all data regarding your ECS or EKS clusters, with alarms, metrics, and logs for your containers. 

With container insights, you get additional metrics such as task count, service count, deployment count, and container instance count. So you no longer need to create custom metrics for these data points. And once you set up container insights for Amazon ECS or EKS, you can view them like any other metric, creating dashboards or setting up alarms based on these metrics.

So, with Container Insights and CloudWatch metrics, you can gain insight into the e-commerce side of your application. For additional metrics on the leaderboard side of your app, you can use AWS Lambda insights.  With Lambda insights, you can track lambda-specific metrics like cold starts and lambda worker shutdowns. 

The way it works is you modify your Lambda function monitoring details to enable Lambda insights with a push of a button. And behind the scenes, AWS will add a Lambda layer to your function and add in additional policies to your execution role so it can collect the data it needs. 

Once container insights and Lambda insights are enabled, they begin generating log events using the embedded metric format, which enables metric data to be captured in logs. These log events are called performance logs. The service then extracts the metric data out of the performance logs. 

These performance logs and any logs you collect in CloudWatch can be queried and analyzed using CloudWatch Logs Insights. 

With Logs insights, you can perform queries to search through and analyze your log data. It uses its own simplistic query language so you can display, filter, sort, and limit your log data - making it easier to find trends and correlate data. 

Next, you can begin instrumenting your application with traces. To do this you can install the X-ray agent on your EC2 instance, or create a docker image that runs the X-ray Daemon on your ECS cluster. To use with Lambda, no installation is needed, you just need to enable the service with a push of a button in the Lambda function monitoring details. 

Once X-Ray is enabled, you can then instrument your application with X-Ray using the AWS SDKs or the AWS Distro for OpenTelemetry. Once you’ve instrumented your application, you can then view a service map of your infrastructure, see latency information, request metadata, and more to improve the performance of your application. 

X-Ray integrates deeply with CloudWatch ServiceLens, CloudWatch Synthetics, and CloudWatch RUM. ServiceLens provides an end-to-end view of your application, and is often the first place people look to troubleshoot their app. In ServiceLens, you can see bottlenecks, and identify which users of your services are impacted, as well as look at metrics and log data. 

With Synthetics, you can use canaries to perform the same actions as your users, to ensure a positive customer experience to monitor for issues like dead links, transaction issues, latency issues and more. And with CloudWatch RUM, you can look at client-side data for your application to get better insight into actual user sessions. 

So a typical journey to debug a problem might look like this: 

  • A CloudWatch alarm is raised

  • You receive an alarm and start to investigate the problem by looking at the metric associated with the alarm.

  • From there, you can view an end-to-end view of your app in ServiceLens. With this, you can see if there’s high latency or bottlenecks 

  • After that, you can view x-ray trace data to see how customer requests are flowing through your application,

  • Once you pinpoint what service you think may be the issue, you can look at metrics and logs to verify the impact and correlate data on what might have been the root cause. 

  • And then, you can query logs using log insights and query metrics using metric insights to search for patterns, and answer questions like “why did my metric spike?” and more.  

That’s all for this one! See you next time.


About the Author
Learning Paths

Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.