image
Introduction to Partitioning
Introduction to Partitioning
Difficulty
Intermediate
Duration
3h 3m
Students
1633
Ratings
4.7/5
Description

This course provides detail on the AWS Database services relevant to the AWS Certified Developer - Associate exam. This includes Amazon RDS, Aurora, DynamoDB, MemoryDB for Redis, and ElastiCache.

Want more? Try a lab playground or do a Lab Challenge!

Learning Objectives

  • Obtain a solid understanding of the following Amazon database services: Amazon RDS, Aurora, DynamoDB, MemoryDB for Redis, and ElastiCache
  • Create an Amazon RDS database
  • Create a DynamoDB database
  • Create an ElastiCache cluster
Transcript

When you think about a relational database, you commonly think of one node hosting all of your data. It’s easy to think about non-relational databases the same way and assume a database like DynamoDB has one node that handles it all.

But as you probably already know, DynamoDB can scale infinitely - and the idea that we can have one node that scales infinitely forever is simply not feasible - at least, currently. 

The reality is that DynamoDB segments your data behind the scenes into what’s called partitions. Conceptually, these partitions are stored across multiple nodes storing your data. 

This is why the primary key for a DynamoDB item is called a partition key - as it is a key that helps DynamoDB decide which partition an item should be stored in. So if you have a simple primary key with just a partition key and you perform a putItem request. What happens? Well, DynamoDB will take that new item, look at the partition key, hash that value, and then determine the best partition to store it in. 

When you read an item from the table, you’ll provide the partition key and DynamoDB will use that as an input to the hash function to figure out which partition the data is stored in. This hashing technique is often why a partition key is called a hash key. 

If you have a compound primary key, with a partition and sort key, it uses both of these values to partition your data. When you write a new item, it first takes the partition key, hashes the value, and determines the best partition to store it in. Only, all the items with the same partition key are stored physically close together and the order of how the items are sorted is based on the sort key. 

For example, say we have a database of books, with a partition key of author name and a sort key of book name. We have three items in the database: 

  • The first item has a partition key of “Stephen King” and sort key of “The shining” 

  • The second item has a partition key of “Stephen King” and sort key of “Cujo” 

  • The third item has a partition key of “Stephen King” and a sort key of “Carrie” 

Since they all have the same partition key, all three books might be stored physically on the same partition. Within the partition, the sort key will determine the order, so the items might be stored with Carrie first, Cujo second, and The Shining third. Each of these partitions are limited to around 10 GB of data and as the table grows, DynamoDB will add more partitions behind the scenes. 

DynamoDB will add a new partition whenever you cross one of two thresholds: 

  1. Either the size of the data in a partition grows larger than 10 gigabytes, or

  2. When the total number of read capacity units goes over 3000 or 1000 write capacity units per second for any one partition 

DynamoDB doesn’t advertise the number of partitions your table has - in fact, this number really shouldn’t matter that much to you. That’s because DynamoDB is constantly behind the scenes automatically resharding your data and redistributing your partitions in response to changes in read throughput, write throughput, and storage. It does all of the balancing for you, completely transparent to your application. 

What’s really important is to understand the theory behind how DynamoDB partitions data so that you can better understand how to model your data effectively and efficiently for your access patterns. All right, that’s it for this one - see you next time.

About the Author
Students
236830
Labs
1
Courses
232
Learning Paths
187

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.