The course is part of this learning path
This course introduces a number of different AWS database services that are commonly used with data analytics solutions and that will likely be referenced within the AWS Data Analytics Specialty certification.
As such, this course explores the following database services: Amazon RDS, Amazon DynamoDB, Amazon ElastiCache, Amazon Redshift. There are also guided demos on the AWS platform showing you exactly how to use each of these services.
If you have any questions relating to this course, feel free to contact us at firstname.lastname@example.org.
- Gain an overall understanding of the different database services available in AWS, and which service would be best suited to your needs
- Learn how to create databases with both Amazon RDS and Amazon DynamoDB
- Learn how to create clusters with Amazon ElastiCache and Amazon Redshift
This course has been designed to help those who are preparing to take the AWS Data Analytics Specialty certification.
As a prerequisite to this course, you should have a basic understanding of database architectures and of the AWS global architecture. For more info on the latter, please see our existing blog post here. You should also have a general understanding of the principles behind different EC2 instance families.
Hello and welcome to this lecture covering Amazon DynamoDB. Amazon DynamoDB is a NoSQL database, which means that it doesn't use the common Structured Query Language, SQL. It falls into a category of databases known as key-value stores. A key value store is simply a collection of items or records, and you can look up data by using a primary key for each item or through the use of indexes.
Amazon DynamoDB is designed to be used for ultra high performance, which could be maintained at any scale with single-digit latency, making this a very powerful database choice used commonly for gaming, web, mobile, and IoT applications to name but a few. Much like Amazon RDS, DynamoDB is also a fully managed service, taking many of the day-to-day administration operations out of your hands, giving you more time to focus on the business logic of your database. That's one of the great things about Amazon DynamoDB, there's no database administration required by us as a customer, no service to manage and nothing to back up. Instead, AWS handles all of this for you. This makes the creation of a DynamoDB database very easy. All you have to do is set up your tables and configure the level of provision throughput that each table should have. Provision throughput refers to the level of read and write capacity that you want AWS to reserve for your table. You are charged for the total amount of throughput that you configure for your tables plus the total amount of storage space used by your data.
If we actually look at the configuration screen when creating a new DynamoDB database, as seen here, you can see that there are very few options required to create a new database. And in fact, in its simplest form, you can just provide a table name and a primary key, which is used to partition data across hosts for scalability and availability. You can then accept any remaining defaults and create your database, it's as simple as that. DynamoDB tables are considered schemaless because there's no strict design and schema that every record must conform to. As long as each item has an appropriate primary key, the item can contain varying sets of attributes. The records in a table do not need to have the same attributes or even the same number of attributes. This can be very convenient for rapid application development and if you want to add a new column to a table, you don't need to alter the table, you can just start including the new field as an attribute when you insert new records. Likewise, you never need to adjust the data type for a column as DynamoDB generally isn't interested in data types for individual attributes.
If when creating your DynamoDB database, you choose not to reset all the defaults, what other options exist? Let's take a look. Unchecking the use default settings from the Table settings section provides you with the following. Firstly, you'll be asked about secondary indexes, which allow you to perform queries on attributes that are not part of the table's primary key. The default option provides no secondary index. However, you can add them here if required. DynamoDB lets you create additional indexes so that you can run queries to search your data by other attributes. If you've worked with relational databases, you've probably used indexes with those, but there are a couple of big differences in how indexes operate in DynamoDB.
First, each query can only use one index. If you want to query and match on two different columns, you need to create an index that can do that properly. Second, when you write your queries, you need to specify exactly which index should be used for each query. It's not like a relational database that has a query analyzer, which can decide which indexes to use for our query. Here you need to be explicit and tell DynamoDB what index to use. DynamoDB has two different kinds of secondary indexes, global indexes let you query across the entire table to find any record that matches a particular value and by contrast, local secondary indexes can only help find data within a single partition key.
Following secondary indexes, you can modify the default settings applied to your table's read/write capacity mode. When you create a table in DynamoDB, you need to tell AWS how much capacity you want to reserve for the table. You don't need to do this for disk space as DynamoDB will automatically allocate more space for your table as it grows. However, you do need to reserve capacity for input and output for reads and writes. Amazon charges you based on the number of read capacity units and write capacity units that you allocate. It's important to allocate enough for your workload, but don't allocate too much or DynamoDB could become prohibitively expensive.
By default, when you create a table in the AWS Console, Amazon will configure your table with five read capacity units and five write capacity units. There are two modes that you can choose from, provisioned and on-demand. Provisioned mode allows you to provision set read and writes allowed against your database per second by your application and is measured in capacity units, RCUs for reads and WCUs for writes. Depending on the transaction, each action will use one or more RCUs or WCUs. Provisioned mode is used generally when you have a predicted and forecasted workload of traffic. On-demand mode does not provision any RCUs or WCUs, instead they are scaled on demand. The downside is that it is not as cost effective as provisioned. This mode is generally used if you do not know how much workload you are expected to experience. Over time, you are likely to get more of an understanding of load and you can change your mode across to provisioned.
Once you have selected the provisioned mode, you will then have the opportunity to add configuration information relating to how your RCU and WCU are scaled as demand increases and decreases. As you can see, by entering your minimum and maximum provisioned capacity along with your target threshold utilization as a percentage, you can confidently rely on Amazon DynamoDB to manage the scaling operations of your throughput.
The last main point of the configuration allows you to set encryption of your tables, which is enabled by default for data at rest. Through the use of the key management service, KMS, you are able to select either a customer managed or AWS managed CMK to use for the encryption of your table instead of the default keys used by DynamoDB. For more information on CMKs and the key management service in general, please refer to our existing course found here.
Before I finish this lecture covering DynamoDB, I just want to cover some of its advantages and also what can be considered disadvantages. Some of the advantages of DynamoDB is that it's fully managed by AWS, you don't have to worry about backups or redundancy, although you're welcome to set up these kinds of safeguards using some more advanced DynamoDB features.
As mentioned previously, DynamoDB tables are schemaless so you don't have to define the exact data model in advance, the data model can change automatically to fit your application's needs.
DynamoDB is designed to be highly available and your data is automatically replicated across three different availability zones within a geographic region. In the case of an outage or an incident affecting the entire hosting facility, DynamoDB transparently routes around the affected availability zone.
DynamoDB is designed to be fast, read and writes take just a few milliseconds to complete and DynamoDB will be fast no matter how large your table grows, unlike relational database, which can slow down as the table gets large. DynamoDB performance is constant and stays consistent even with tables that are many terabytes in size. You don't have to do anything to handle this, except adjusting the provisioned throughput levels to make sure you've preserved enough read and write capacity for your transaction volume.
There are also some downsides to using DynamoDB too. As I just mentioned, your data is automatically replicated. Three copies are stored in three different availability zones and that replication usually happens quickly in milliseconds, but sometimes it can take longer and this is known as eventual consistency. This happens transparently and many operations will make sure that they're always working on the latest copy of your data, but there are certain kinds of queries and table scans that may return older versions of data before the most recent copy. You need to be aware of how this works and you may need to adjust certain queries to require strong consistency.
DynamoDB's queries aren't as flexible as what you can do with SQL. If you are used to writing advanced queries with joins and groupings and summaries, you won't be able to do that with DynamoDB. You'll have to do more of the computation in your application code. This is done for performance reasons to ensure that every query finishes quickly and that complicated queries can't hog the resources on a database server.
DynamoDB also has some strict limitations in the way you're allowed to work with it. Two important limitations are the maximum record size of 400 kilobytes and the limit of 20 global indexes and five secondary indexes per table. There are other limitations that can be adjusted by contacting AWS customer support like the maximum number of tables in an AWS account.
Finally, although DynamoDB performance can scale up as your needs grow, your performance is limited to the amount of read and write throughput that you've provisioned for each table. If you expect a spike of the database use, you'll need to provision more throughput in advance or database requests will fail with a ProvisionedThroughputExceededException message. Fortunately, you can adjust throughput at any time and it only takes a couple of minutes to adjust. Still, this means that you'll need to monitor the throughput being used in each table or you'll risk running out of throughput if your usage grows.
Course Introduction - Amazon Relational Database Service - DEMO: Creating an Amazon RDS Database - DEMO: Creating a DynamoDB Database - Amazon ElastiCache - DEMO: Creating an ElastiCache Cluster - Amazon Redshift - DEMO: Creating an Amazon Redshift Cluster
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 90+ courses relating to Cloud reaching over 100,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.