How to get failure information from the Service Health Dashboard in the event of a region-wide failure?
We will show you how to use the AWS Service Health Dashboard Notification Tool to receive region-wide failure information on the Service Health Dashboard through SNS, Slack, and etc.
When we set up event notifications by linking Personal Health Dashboard and CloudWatch Events, we were only able to receive account-specific events.
AWS Health only sends account-specific events to CloudWatch Events. Region-wide issues listed in the Service Health Dashboard are public events and are not sent to CloudWatch Events.
How to receive region-wide notification on the Service Health Dashboard (SHD)?
In this article, the Personal Health Dashboard and CloudWatch Events integration does not send region-wide failure information.
Let’s consider using the AWS service health dashboard notification tool.
The SHD notification tool uses a polling approach to send event information to SNS topics, Chime, and Slack each time SHD failure information is updated.
We will show you how to start using the tool below.
Step 1. Download and install shd-notifier
Run the following command to download and install shd-notifier.
$ git clone https://github.com/aws/aws-health-tools.git && cd aws-health-tools
$ git filter-branch --subdirectory-filter shd-notifier/ HEAD
Step 2. Set the notification destination endpoint and Webhook
Set endpoints and webhooks for event notification destinations (SNS topics, Chime, Slack).
We will set SNS to be the notification destination, so we created an SNS topic and make note of its ARN.
- If you use Chime or Slack, please refer to the following URL to set the Webhook.
- For Amazon Chime Webhook, refer to AWS documents additional Webhook of the chat room , and for Slack Webhook, refer to Slack web site use of Incoming Webhook in Slack.
Step 3. Deploy the CloudFormation stack
Deploy the CloudFormation stack provided by AWS.
Make sure the template is ready and you can see the URL in the Amazon S3 URL field, then click Next.
Enter the stack name and enter the parameters.
Explanation of each parameter is as follows:
- AppName: Used as the base for Lambda function names created after stack deployment.
- Bail: Allows you to choose whether to send a message if the SHD has no updates.
0 (Send "no update" messages every 15 minutes)
1 (Send messages only when a new update has been made)
- ChatClient: Select the notification destination set in the previous step.
- DEBUG: You can choose to enable debug settings.
0 (Disable Debugging)
1 (Enable Debugging)
- EndpointArray: Enter the notification destination endpoint or webhook generated in the previous step.
- LambdaRate: Sets how often Lambda goes to check for new SHD events and updates.
- MessagePrefix: Set the prefix for each update message.
- RegionFilter: Enter the region where you want to receive failure event notifications.
Select Next when you are done typing and selecting.
If you do not want to change any option settings, just select “Next”.
On the review page, select “I acknowledge that AWS CloudFormation might create IAM resources” and then click Create stack button to create stack.
After completing the stack creation, check the content of the four Lambda functions created. The contents of index.py are as follows.
Follow the steps below to fix this.
Step 4. Execution of deploy.sh file
Run deploy.sh in the aws-health-tools installed in the first step.
Shellname deploy.sh <CF_APPNAME><REGION>
- CF_APPNAME = AppName described defined in last step
- REGION = CloudFormation template deployment region
Using zsh shell, the CF_APPNAME is the default folder name (ie. Health-Event), and the REGION was launched in Oregon, so after moving to the target directory, run the following command.
$ zsh deploy.sh Health-Event us-west-2
Checked the Lambda console to make sure the code is successfully written inside the four Lambda functions.
This completes the settings. Confirms that notification is sent out in the event of a region-wide failure.
Around 12:00 (PDT) on May 24th, we received a notification that the API error rate for the EC2 API was increased in the Oregon (US-WEST-2) region to the email address registered in the SNS topic.
From the status history of the Service Health Dashboard, we checked the status of EC2 in the Oregon region on May 24th and confirmed that it was the same as received by email.