What is Observability?
What is Observability?
2h 12m

This course covers the core learning objective to meet the requirements of the 'Architecting for Management & Governance in AWS - Level 2' skill

Learning Objectives:

  • Understand the different AWS management services available to monitor the performance of a solution
  • Apply Amazon CloudWatch monitoring contols to respond to system-wide performance changes
  • Apply AWS Config controls to manage compliance based upon business guidelines

Let’s say you are a cloud operations engineer for a website that produces quality photos of cats. The photos are so high quality that talk show hosts and influencers are constantly talking about how great they are. Because of this, the traffic to your website has increased over time. This sometimes leads to performance issues on your site. This past Friday, there was a huge issue where customers could no longer see their cat photos. 

You, of course, were on vacation when your boss called and said “The cat photo system is down. What’s the problem and when are we going to be back online?” So you go back to your hotel, pull out your work laptop that you brought just in case, and log in to your AWS account.  Unfortunately, this has become a typical Friday for you, where you log in to the AWS console to fix some problem with the infrastructure for the cat photos website. You follow your typical problem-solving method: you first detect the issue, investigate it, and then finally remediate the problem. 

The detect stage usually is when the error occurs. This is ideally followed by an alert, where you may be paged and a trouble ticket gets created. In this case, the alert was your boss calling you on your vacation. Lucky you. From there, you investigate the problem by looking at traces, logs, and metrics and attempt to correlate the data to find the issue. 

And once you find the cause, you can then react to the problem and issue a fix, thus remediating the issue. At this point, you should understand the root cause and can then collaborate with others to ensure this problem doesn’t happen again in the future. This scenario contains examples of both monitoring and observability. Monitoring tells you whether a system is working properly or not, which you discovered it wasn’t, when you were alerted. And observability gives us information about WHY a system isn’t working, which you discovered by looking at logs, metrics, and traces. 

Logs, metrics, and traces are what we call the foundation of observability. Metrics are typically numerical data from a specific time period, such as information about CPU utilization or system error rate.

Logs represent time stamped events that happened over a period of time. With logs, you can get information about your resources, requests, and even create counters for how often things happened. You can additionally see your debugging data as well, including any warnings or errors to help you troubleshoot issues. 

And traces record the paths taken by requests, typically made by an app or a user on the site. For example, if someone presses “buy cat photo” on your website, tons of systems are working behind the scenes: systems to update the shopping cart, process the payment, user profile services, all working together so the customer can successfully buy a cat photo. Tracing helps you see how the backend systems interact together to fulfill the user’s request. 

In AWS, the observability stack starts with these monitoring primitives: the metrics, traces, and logs. To instrument AWS applications, you can use two main services: Amazon CloudWatch and AWS X-Ray. Logs and metrics are captured in Amazon CloudWatch, and traces are captured in AWS X-Ray

These two services are considered the backbone of the observability stack, and AWS has built other native monitoring services using their functionality. For example, X-Ray functionality and CloudWatch functionality made the creation of Amazon CloudWatch ServiceLens possible. This service uses X-Ray to provide an end-to-end view of your application and combines that with CloudWatch metrics and logs, so you have metrics, traces, and logs all in one place.  

Over the years, CloudWatch has become a suite of services, with ServiceLens being just one of them. Other services like Amazon CloudWatch Synthetics and CloudWatch RUM were created to better monitor and test the end-user experience. 

And eventually, CloudWatch functionality was expanded to include a set of insights services, such as

  • Container insights
  • Lambda insights
  • Contributor insights
  • Application insights
  • Log insights 
  • And Metrics insights 

These services are meant to provide additional metrics and logging information for your containers, lambda functions, and applications and provide querying functionality for both metrics and logs. In summary, metrics, traces, and logs are the foundation of observability in AWS. However, there are tons of other services now that you can use to ensure you’re more easily correlating data and properly instrumenting your application.

About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.