What is Observability?
What is Observability?
6h 2m

This section of the AWS Certified Solutions Architect - Professional learning path introduces the AWS management and governance services relevant to the AWS Certified Solutions Architect - Professional exam. These services are used to help you audit, monitor, and evaluate your AWS infrastructure and resources and form a core component of resilient and performant architectures. 

Want more? Try a Lab Playground or do a Lab Challenge!

Learning Objectives

  • Understand the benefits of using AWS CloudWatch and audit logs to manage your infrastructure
  • Learn how to record and track API requests using AWS CloudTrail
  • Learn what AWS Config is and its components
  • Manage multi-account environments with AWS Organizations and Control Tower
  • Learn how to carry out logging with CloudWatch, CloudTrail, CloudFront, and VPC Flow Logs
  • Learn about AWS data transformation tools such as AWS Glue and data visualization services like Amazon Athena and QuickSight
  • Learn how AWS CloudFormation can be used to represent your infrastructure as code (IaC)
  • Understand SLAs in AWS

Let’s say you are a cloud operations engineer for a website that produces quality photos of cats. The photos are so high quality that talk show hosts and influencers are constantly talking about how great they are. Because of this, the traffic to your website has increased over time. This sometimes leads to performance issues on your site. This past Friday, there was a huge issue where customers could no longer see their cat photos. 

You, of course, were on vacation when your boss called and said “The cat photo system is down. What’s the problem and when are we going to be back online?” So you go back to your hotel, pull out your work laptop that you brought just in case, and log in to your AWS account.  Unfortunately, this has become a typical Friday for you, where you log in to the AWS console to fix some problem with the infrastructure for the cat photos website. You follow your typical problem-solving method: you first detect the issue, investigate it, and then finally remediate the problem. 

The detect stage usually is when the error occurs. This is ideally followed by an alert, where you may be paged and a trouble ticket gets created. In this case, the alert was your boss calling you on your vacation. Lucky you. From there, you investigate the problem by looking at traces, logs, and metrics and attempt to correlate the data to find the issue. 

And once you find the cause, you can then react to the problem and issue a fix, thus remediating the issue. At this point, you should understand the root cause and can then collaborate with others to ensure this problem doesn’t happen again in the future. This scenario contains examples of both monitoring and observability. Monitoring tells you whether a system is working properly or not, which you discovered it wasn’t, when you were alerted. And observability gives us information about WHY a system isn’t working, which you discovered by looking at logs, metrics, and traces. 

Logs, metrics, and traces are what we call the foundation of observability. Metrics are typically numerical data from a specific time period, such as information about CPU utilization or system error rate.

Logs represent time stamped events that happened over a period of time. With logs, you can get information about your resources, requests, and even create counters for how often things happened. You can additionally see your debugging data as well, including any warnings or errors to help you troubleshoot issues. 

And traces record the paths taken by requests, typically made by an app or a user on the site. For example, if someone presses “buy cat photo” on your website, tons of systems are working behind the scenes: systems to update the shopping cart, process the payment, user profile services, all working together so the customer can successfully buy a cat photo. Tracing helps you see how the backend systems interact together to fulfill the user’s request. 

In AWS, the observability stack starts with these monitoring primitives: the metrics, traces, and logs. To instrument AWS applications, you can use two main services: Amazon CloudWatch and AWS X-Ray. Logs and metrics are captured in Amazon CloudWatch, and traces are captured in AWS X-Ray

These two services are considered the backbone of the observability stack, and AWS has built other native monitoring services using their functionality. For example, X-Ray functionality and CloudWatch functionality made the creation of Amazon CloudWatch ServiceLens possible. This service uses X-Ray to provide an end-to-end view of your application and combines that with CloudWatch metrics and logs, so you have metrics, traces, and logs all in one place.  

Over the years, CloudWatch has become a suite of services, with ServiceLens being just one of them. Other services like Amazon CloudWatch Synthetics and CloudWatch RUM were created to better monitor and test the end-user experience. 

And eventually, CloudWatch functionality was expanded to include a set of insights services, such as

  • Container insights
  • Lambda insights
  • Contributor insights
  • Application insights
  • Log insights 
  • And Metrics insights 

These services are meant to provide additional metrics and logging information for your containers, lambda functions, and applications and provide querying functionality for both metrics and logs. In summary, metrics, traces, and logs are the foundation of observability in AWS. However, there are tons of other services now that you can use to ensure you’re more easily correlating data and properly instrumenting your application.

About the Author
Learning Paths

Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.