1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. AWS Databases used with Data Analytics

Amazon Relational Database Service

Contents

keyboard_tab
Introduction
1
Course Introduction
PREVIEW1m 39s

The course is part of this learning path

play-arrow
Amazon Relational Database Service
Overview
DifficultyIntermediate
Duration1h 6m
Students81
Ratings
5/5
starstarstarstarstar

Description

This course introduces a number of different AWS database services that are commonly used with data analytics solutions and that will likely be referenced within the AWS Data Analytics Specialty certification.

As such, this course explores the following database services: Amazon RDS, Amazon DynamoDB, Amazon ElastiCache, Amazon Redshift. There are also guided demos on the AWS platform showing you exactly how to use each of these services.

If you have any questions relating to this course, feel free to contact us at support@cloudacademy.com.

Learning Objectives

  • Gain an overall understanding of the different database services available in AWS, and which service would be best suited to your needs
  • Learn how to create databases with both Amazon RDS and Amazon DynamoDB
  • Learn how to create clusters with Amazon ElastiCache and Amazon Redshift

Intended Audience

This course has been designed to help those who are preparing to take the AWS Data Analytics Specialty certification.

Prerequisites

As a prerequisite to this course, you should have a basic understanding of database architectures and of the AWS global architecture. For more info on the latter, please see our existing blog post hereYou should also have a general understanding of the principles behind different EC2 instance families.

Transcript

Hello and welcome to this lecture, where I shall be discussing the first of the AWS database services that I will be covering in this course, the Amazon Relational Database Service, commonly known as RDS. I will look at a number of different common features of the service to give you a general idea of how it's configured. So as the name suggests, this is a relational database service that provides a simple way to provision, create, and scale a relational database within AWS. It's a managed service, which takes many of the mundane administrative operations out of your hands, and it's instead managed by AWS, such as backups and the patching of both the underlying operating system in addition to the database engine software that you select.

Amazon RDS allows you to select from a range of different database engines. These currently include MySQL, and this is considered the number one open source relational database management system. MariaDB. This is the community-developed fork of MySQL. PostgreSQL. This database engine comes in a close second behind MySQL as the preferred open source database. Amazon Aurora. Amazon Aurora is AWS's own fork of MySQL, which provides ultrafast processing and availability, as it has its own cloud-native database engine. Oracle. The Oracle database is a common platform in corporate environments. And SQL Server. This is a Microsoft database with a number of different licensing options.

In addition to so many different database engines, you also have a wide choice when it comes to selecting which compute instance you'd like to run your database on. The varying different options offer different performance and allowed to architect your environment based on your expected load. When you create your RDS database, you must select an instance to support your database from a processing and memory perspective, as shown here in this screenshot, using the MySQL database engine. Currently, these are the different instance types available to you for each of the database engines, which are categorized between general purpose and memory-optimized.

For a breakdown of the performance of each of these instance types, please refer to the following AWS documentation. For each of these instance types, you also have various instance sizes, each equating to a different performance level from a vCPU and memory perspective. For example, when looking at the T3 instance, we have the following sizes available.

You can deploy your RDS instance in a single availability zone. However, if high availability and resiliency is of importance when it comes to your database, then you might want to consider a feature known as Multi AZ, which stands for multi availability zones. When Multi AZ is configured, a secondary RDS instance is deployed within a different availability zone within the same region as the primary instance. The primary purpose of the second instance is to provide a failover option for your primary RDS instance. The replication of data between the primary RDS database and the secondary replica instance happens synchronously.

Let's look at how Multi AZ would work in a production environment. If you have configured Multi AZ and an incident occurs which causes an outage to the primary RDS instance, then the RDS failover process takes over automatically. This process is managed by AWS, and it's not something that you need to manually perform or trigger. RDS will update the DNS record to point to the secondary instance. This process can typically take between 60 and 120 seconds. The length of time is very dependent on the size of the database, its transactions, and the activity of the database at the time of failover. This automatic changeover enables you to continue using the database without the need of an engineer making any changes to your environment. The failover process will happen in the following scenarios. If patching maintenance has been performed in the primary instance, if the instance of the primary database has a host failure, if the availability zone of the primary database fails, if the primary instance was rebooted with failover, and if the primary database instance class on the primary database is modified.

As you can see, activating Multi AZ is an effective measure and precaution to implement to ensure you have resiliency built in should an outage occur. For detailed information on RDS Multi AZ, please refer to our existing course here.

Over time, your workloads on your database will fluctuate. And so how can you optimize your RDS database to ensure it is capable of meeting the demands of your load, both from a storage and compute perspective? When it comes to scaling your storage, you can use a feature called storage autoscaling. MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server all use Elastic Block Store, EBS volumes, for both data and log storage. However, Amazon Aurora uses a shared cluster storage architecture and does not use EBS. The database engines that use EBS support general purpose SSD storage, provisioned IOPS SSD storage, and magnetic storage.

The general purpose SSD storage is a good option for a broad range of use cases which provides single-digit millisecond latencies and offers a cost-effective storage solution. The minimum SSD storage volume for your primary dataset is 20 gibibytes with a maximum of 64 tebibytes for MySQL, PostgreSQL, MariaDB, and Oracle. However, the maximum for SQL Server is 16 tebibytes. Provisioned IOPS SSD. Now, this option is great for when you have workloads that operate at a very high I/O. You can provision a minimum of 8,000 IOPS and a maximum of 80,000 for MySQL, PostgreSQL, MariaDB, and Oracle, but the maximum for SQL Server is 40,000.

In addition to being able to provision the IOPS needed for your workload, the minimum storage for your primary dataset is a 100 gibibytes with a maximum of 64 tebibytes for MySQL, PostgreSQL, MariaDB, and Oracle, and 16 tebibytes for SQL Server. Magnetic storage is simply supported to provide backwards compatibility. And so instead, AWS recommends that you select general purpose. The following screenshot shows the configuration screen when setting your storage requirements during the creation of a MySQL RDS database. 

n this example, general purpose has been selected as the storage with a minimum of 100 gibibytes of primary storage. Under the storage autoscaling section, I've enabled the feature with the checkbox and set a maximum storage threshold of 1000 gibibytes. This means that one storage will start at 100 gibibytes when the database is created and the storage will automatically scale with demand up to a maximum of 1000 gibibytes without you having to intervene in any way. The maximum storage threshold can be set at 65,536 gibibytes. Aurora doesn't use EBS and instead uses a shared cluster storage architecture which is managed by the service itself.

When configuring your Aurora database in the console, the option to configure and select storage options like we saw previously does not exist. Your storage will scale automatically as your database grows. To scale your compute size, which is effectively your instance, is easy to do in RDS both vertically and horizontally. Vertical scaling will enhance the performance of your database instance. For example, scaling up from an m4.large to an m4.2xlarge. Horizontal scaling will see an increase in the quantity of your current instance. For example, moving from a single m4.large to three m4.large instances in your environment through the means of read replicas.

At any point you can scale your RDS database vertically, changing the size of your instance. When doing so, you can select to perform the change immediately or wait for a scheduled maintenance window. For horizontal scaling, read replicas can be used by application and other services to save read only access to your database data via a separate instance. So, for example, let's assume we have a primary RDS instance which serves both read and write traffic. Due to the size of the instance and the amount of read-intensive traffic being directed to the database for queries, the performance of the instance has taken a hit. So to help resolve this, you can create a read replica. 

A snapshot will be taken of your database, and if you're using Multi AZ, then this snapshot will be taken of your secondary database instance to ensure that there are no performance impacts during this process. Once the snapshot is complete, a read replica instance is created from this data. The read replica then maintains a secure asynchronous link between itself and the primary database. At this point, read only traffic can be directed to the read replica to serve queries. Before implementing read replicas, please check with the latest AWS documentation to identify database engine read replica compatibility.

As I mentioned previously, many of the administrative tasks for RDS are taken care of by AWS. For example, patching and automated backups. As Amazon RDS is a managed service, and from a shared responsibility model is considered a container service where you have no access to the underlying operating system on which your database runs on. As you can see, both platform and application management and operating systems falls under the realm of AWS responsibilities. As a result, AWS is responsible for both the patching of the operating system and any patching for the database engine themselves. More information on the AWS shared responsibility model can be found in my existing blog post found here.

From a backup perspective, by default, Amazon RDS provides an automatic feature seen here. This is enabled on all new RDS databases, which backs up your RDS instance to Amazon S3. You are able to configure the level of retention in days from zero to 35 and implement a level of encryption using the key management service, or KMS. More information on KMS can be found in our existing course here.

You can also perform manual backups anytime you need to, which are known as snapshots. However, these snapshots are not bound by the retention period set in the automatic backup configuration and are only deleted through a manual process. When using a MySQL-compatible Aurora database, you can also use a feature called backtrack, and this allows you to go back in time on the database to recover from an error or incident without having to perform a restore or create another database cluster.

As you can see from the configuration page during the Aurora database creation process, it is enabled via a checkbox and allows you to enter a number of hours of how far you would like to backtrack to with a maximum of 72 hours. In this example, I've entered 12 hours, and so Aurora will retain log data of all changes for 12 hours as specified. We've now covered the fundamentals of Amazon RDS and some of the key features to be aware of. If you'd like to get some hands-on experience of using Amazon RDS, feel free to check out the following labs.

Creating your first Amazon RDS Database

Getting started with Amazon Aurora Database Engine

Lectures

Course Introduction - DEMO: Creating an Amazon RDS Database - Amazon DynamoDB - DEMO: Creating a DynamoDB Database - Amazon ElastiCache - DEMO: Creating an ElastiCache Cluster - Amazon Redshift - DEMO: Creating an Amazon Redshift Cluster

About the Author
Students113956
Labs1
Courses95
Learning paths63

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 90+ courses relating to Cloud reaching over 100,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.