This course looks at the AWS Health Dashboard, a tool that can help you plan and work around issues in AWS. These events include outages, scheduled maintenance, and service degradation events. So, let’s get you prepared for these events with this awesome tool.
By the end of this course, you should have a greater understanding of the AWS Health dashboard, its features, and how to integrate it into your high-availability solutions. Some of the key points we’ll be covering in this course include the following:
- Service history and open issues
- Service events specific to your AWS accounts
- EventBridge integration for the Health Dashboard
- Enterprise-level offerings (such as the AWS Health API)
- Systems Administrator
- DevOps Engineer
- AWS Student learning for certification reasons
- Have a general understanding of Amazon EventBridge and AWS Infrastructure
- General knowledge about AWS services currently in use by your organization that could be impacted during scheduled maintenance and outages
The Health dashboard is divided into two main sections: events that affects everyone in the top left menu and events that affects your account's resources right below that. Let's go over each one. First, you have Open and recent issues. This is where you can see current issues happening in the AWS platform. More often than not, this option will show as disable if there's not enough interest going on. Service history on the other hand, will show a historic view of issues. This is really helpful if something happened over a weekend or a holiday, and you want to get the details about which services and regions were affected. Let's look at an example of a possible outage. Each one of these tickets will have a header showing the latest status in this case resolved, and a short description of the issue, in this case increased API error rates. You want to pay close attention to the affected services list. In this case is a list of 20 services and chances are that if you were using one of these services when this event happened, in this case, in the US-EAST-1 region, your application would have experienced some problems.
Information like this is useful to shorten troubleshooting times and also to consider multi-region solutions if your business suffers a significant impact by an issue like this occurring in the same AWS region that you're using. If we switch over to your account health, this is where the Health dashboard becomes really useful because it correlates AWS global issues with the resources and services that you're currently using. This way you can see if there's any impact to your business. For example, let's say you're running an EC2 instance and it's been running nonstop for 12 months. You may go here under the Schedule Events tab or you may get it in an email from AWS and see something like this.
One or more of your instances have Schedule Events. Essentially, this means that the physical hardware running your EC2 server may need to be taken down for repairs, upgrades, or simply maintenance. The solution is simple by the way, simply stop and restart your virtual EC2 instance and it will come online on a different physical computer, therefore, allowing AWS to perform maintenance without further interruption to you or any other customers. It's totally understandable if you don't want to have to manually visit a web page to find out if there's an outage affecting your AWS infrastructure. For this, there's a solution, EventBridge. EventBridge can be used to monitor and react to AWS Health dashboard events and then take action, including sending a notification to the Ops team, identifying affected resources, and executing custom Lambda functions to perform pretty much any task such as creating a Zendesk or a JIRA ticket related to the event, for example.
We will be looking at this in more detail, but here's an EventBridge pattern to catch events related to a notification, scheduled changes, or issues sent to your account via the Health dashboard. With this pattern in EventBridge, you can quickly react to potential issues without human intervention and notify the right folks in order to decide what to do. Note the service filter here that includes AUTOSCALING, EC2, and VPC. This is important because if you're not using AWS S3, for example, you don't want to send out alerts if this service won't impact you directly.
Software Development has been my craft for over 2 decades. In recent years, I was introduced to the world of "Infrastructure as Code" and Cloud Computing.
I loved it! -- it re-sparked my interest in staying on the cutting edge of technology.
Colleagues regard me as a mentor and leader in my areas of expertise and also as the person to call when production servers crash and we need the App back online quickly.
My primary skills are:
★ Software Development ( Java, PHP, Python and others )
★ Cloud Computing Design and Implementation
★ DevOps: Continuous Delivery and Integration