This section of the AWS Certified Solutions Architect - Professional learning path introduces you to the AWS database services relevant to the SAP-C02 exam. We then understand the service options available and learn how to select and apply AWS database services to meet specific design scenarios relevant to the AWS Certified Solutions Architect - Professional exam.
Want more? Try a Lab Playground or do a Lab Challenge!
Learning Objectives
- Understand the various database services that can be used when building cloud solutions on AWS
- Learn how to build databases using Amazon RDS, DynamoDB, Redshift, DocumentDB, Keyspaces, and QLDB
- Learn how to create ElastiCache and Neptune clusters
- Understand which AWS database service to choose based on your requirements
- Discover how to use automation to deploy databases in AWS
- Learn about data lakes and how to build a data lake in AWS
In DynamoDB, you are responsible for your data and how you model that data. To do this, it’s helpful to understand the terminology the service uses.
In DynamoDB, you store your data in tables. When you create a table, you provide it a name and an AWS Region that it exists in. For example, the cars table in us-east-1. Just like with any database, tables are just collections of objects. Depending on the database you use, these objects can be called many things - but in DynamoDB, they’re called items. An item is basically just a “row” or a “record”.
You then have attributes that describe the item. You can consider your attributes as “columns”. For example, you may have an item for each car that has attributes of “make”, “model”, and “year”.
The only requirement for each item is that it has a key to uniquely identify it. These keys are called partition keys in DynamoDB. DynamoDB uses partition keys to store your data in logical data partitions behind the scenes. For example, a partition key for cars might be a car id or a VIN number.
As long as each item has an appropriate partition key, they can have varying sets of attributes. They don’t need to all have the same attributes or even the same number of attributes as other items. For example, one of the cars may have an attribute of “trim” that no other car has. This is a benefit of NoSQL databases and why they are often considered “schemaless”.
Along with your partition key, you can optionally choose to use a sort key as well. Sort keys will sort your data within the same partition, which comes in handy for fast querying. For example, you could have a partition key with the car VIN number and a sort key of customer ID, for the customer that owns the car. You can now perform powerful queries based on this sort key. By using both the partition and sort key, your partition key no longer needs to be unique. Many partition keys can have the same value - as long as the combination of partition and sort key is unique. Using both a partition and sort key is referred to as a composite primary key.
So, all you have to do is set up these tables and model your data with items and attributes. When you create the table, you also have to configure the level of read and write throughput that each table should have.
There are two methods of configuring this throughput: the provisioned throughput mode and the on-demand capacity mode. The first option is the provisioned throughput mode. This enables you to specify the amount of reads and writes for your table by choosing the number of Read Capacity Units (or RCUs) and Write Capacity Units (or WCUs).
For an item of up to 4 KB in size, you need one RCU for a strongly consistent read, half an RCU for an eventually consistent read and if you use transactions, you’ll need two RCUs.
One WCU represents one write per second for an item up to 1 KB in size. If you use transactions, they require two WCUs to perform one write per second for the same item size.
So using the provisioned throughput mode does require some calculation to ensure that your table is getting enough bandwidth. This is a great choice when your traffic is mostly steady.
However, if your traffic rises and falls (like most traffic patterns), you can use provisioned throughput mode in combination with DynamoDB Auto Scaling. With Auto Scaling, you can specify upper and lower limits of RCUs and WCUs. So, if traffic goes down, it will decrease your provisioned reads and writes. If traffic rises, it will increase your provisioned reads and writes. All while staying within the boundaries of your RCU and WCU limitations.
Without Auto Scaling, if you expect a spike in database use, you will need to provision more throughput in advance or database requests will fail with a ProvisionedThroughputExceededException.
With provisioned mode, you can also choose to reserve this capacity by committing to a minimum provisioned level of usage. This, in turn, will give you a discounted hourly rate for the RCUs and WCUs you provisioned.
The second mode is the on-demand capacity mode. This mode is far more simplistic. With this mode, DynamoDB decides how many reads and writes your table needs based on the traffic you receive. This provides a just-in-time approach to giving your application exactly the amount of capacity it needs as soon as it needs it. So all of the WCU and RCU calculations go away and it handles the Auto Scaling for you.
At this point, you might be saying to yourself - “wow this is great! I don’t have to calculate WCUs and RCUs and mess with Auto Scaling with the on-demand mode. Why would anyone even use the provisioned throughput mode?
Well, it all comes down to pricing. With either mode, you are charged for the total amount of read and write requests that you use, plus the total amount of storage space used by your data. The biggest downside of using the on-demand mode is that it costs more per request than the provisioned throughput mode does. However, if you don’t have any requests to your database, you don’t have to pay. You only pay for what you’re storing. Whereas, with the provisioned throughput mode, if you don’t have any requests - you still have to pay for the RCUs and WCUs you provisioned.
That’s it for this one - see you next time!
Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.