Overview of the AWS Health Dashboard
Overview of the AWS Health Dashboard
7h 20m

This course provides detail on the AWS Management & Governance services relevant to the AWS Certified DevOps Engineer - Professional exam.

Want more? Try a lab playground or do a Lab Challenge!

Learning Objectives

  • Learn how AWS AppConfig can reduce errors in configuration changes and prevent application downtime
  • Understand how the AWS Cloud Development Kit (CDK) can be used to model and provision application resources using common programming languages
  • Get a high-level understanding of Amazon CloudWatch
  • Learn about the features and use cases of the service
  • Create your own CloudWatch dashboard to monitor the items that are important to you
  • Understand how CloudWatch dashboards can be shared across accounts
  • Understand the cost structure of CloudWatch dashboards and the limitations of the service
  • Review how monitored metrics go into an ALARM state
  • Learn about the challenges of creating CloudWatch Alarms and the benefits of using machine learning in alarm management
  • Know how to create a CloudWatch Alarm using Anomaly Detection
  • Learn what types of metrics are suitable for use with Anomaly Detection
  • Create your own CloudWatch log subscription
  • Learn how AWS CloudTrail enables auditing and governance of your AWS account
  • Understand how Amazon CloudWatch Logs enables you to monitor and store your system, application, and custom log files
  • Explain what AWS CloudFormation is and what it’s used for
  • Determine the benefits of AWS CloudFormation
  • Understand what the core components are and what they are used for
  • Create a CloudFormation Stack using an existing AWS template
  • Learn what VPC flow logs are and what they are used for
  • Determine options for operating programmatically with AWS, including the AWS CLI, APIs, and SDKs
  • Learn about the capabilities of AWS Systems Manager for managing applications and infrastructure
  • Understand how AWS Secrets Manager can be used to securely encrypt application secrets

The Health dashboard is divided into two main sections: events that affects everyone in the top left menu and events that affects your account's resources right below that. Let's go over each one. First, you have Open and recent issues. This is where you can see current issues happening in the AWS platform. More often than not, this option will show as disable if there's not enough interest going on. Service history on the other hand, will show a historic view of issues. This is really helpful if something happened over a weekend or a holiday, and you want to get the details about which services and regions were affected. Let's look at an example of a possible outage. Each one of these tickets will have a header showing the latest status in this case resolved, and a short description of the issue, in this case increased API error rates. You want to pay close attention to the affected services list. In this case is a list of 20 services and chances are that if you were using one of these services when this event happened, in this case, in the US-EAST-1 region, your application would have experienced some problems.

Information like this is useful to shorten troubleshooting times and also to consider multi-region solutions if your business suffers a significant impact by an issue like this occurring in the same AWS region that you're using. If we switch over to your account health, this is where the Health dashboard becomes really useful because it correlates AWS global issues with the resources and services that you're currently using. This way you can see if there's any impact to your business. For example, let's say you're running an EC2 instance and it's been running nonstop for 12 months. You may go here under the Schedule Events tab or you may get it in an email from AWS and see something like this. 

One or more of your instances have Schedule Events. Essentially, this means that the physical hardware running your EC2 server may need to be taken down for repairs, upgrades, or simply maintenance. The solution is simple by the way, simply stop and restart your virtual EC2 instance and it will come online on a different physical computer, therefore, allowing AWS to perform maintenance without further interruption to you or any other customers. It's totally understandable if you don't want to have to manually visit a web page to find out if there's an outage affecting your AWS infrastructure. For this, there's a solution, EventBridge. EventBridge can be used to monitor and react to AWS Health dashboard events and then take action, including sending a notification to the Ops team, identifying affected resources, and executing custom Lambda functions to perform pretty much any task such as creating a Zendesk or a JIRA ticket related to the event, for example.

We will be looking at this in more detail, but here's an EventBridge pattern to catch events related to a notification, scheduled changes, or issues sent to your account via the Health dashboard. With this pattern in EventBridge, you can quickly react to potential issues without human intervention and notify the right folks in order to decide what to do. Note the service filter here that includes AUTOSCALING, EC2, and VPC. This is important because if you're not using AWS S3, for example, you don't want to send out alerts if this service won't impact you directly.


About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.