Balancing Partitions in Large Tables
Start course
3h 3m

This course provides detail on the AWS Database services relevant to the AWS Certified Developer - Associate exam. This includes Amazon RDS, Aurora, DynamoDB, MemoryDB for Redis, and ElastiCache.

Want more? Try a lab playground or do a Lab Challenge!

Learning Objectives

  • Obtain a solid understanding of the following Amazon database services: Amazon RDS, Aurora, DynamoDB, MemoryDB for Redis, and ElastiCache
  • Create an Amazon RDS database
  • Create a DynamoDB database
  • Create an ElastiCache cluster

We know a couple of key details about DynamoDB

  1. DynamoDB chooses partitions for your data based on the partition key

  2. Each partition gets a certain amount of read and write capacity - specifically a hard limit of 3000 RCUs and 1000 WCUs. 

In cases where a compound primary key is used, DynamoDB also stores all of the items with the same partition key physically close to one another. 

This is helpful if you have a partition key with high cardinality - meaning, with a lot of unique values. For example, if you had a partition key that was a book ID field, that would provide a lot of distinct values and each book would end up in a different partition. But if you had a partition key with low cardinality, for example, a true/false field - where many items could have the same partition key, that would mean that you’re constantly accessing the same partitions over and over again. 

This matters because DynamoDB uses the table limits for RCUs and WCUs to specify throughput for each partition. For example, say you use provisioned throughput capacity mode and create a table with 400 RCUs and 400 WCUs. Let’s also say this table has four partitions. DynamoDB would divide your capacity evenly among each partition. So each partition would get 100 RCUs and 100 WCUs. 

Let’s say the first 3 partitions only use 50 WCUs, but the last partition goes over its soft limit and uses 150 WCUs. If you have a partition that is getting more requests than others, and exceeding the normal limits of capacity, this is referred to as a “hot partition”. 

In fact, if your partitions become wildly unbalanced and one partition is getting far more read and write traffic than the allotted read and write capacity can support, what happens is DynamoDB reads and writes stop functioning in that partition. You might experience throttling or bottlenecks, and get an error called ProvisionedThroughputExceeded. 

If you get this error and you’re using the AWS SDK, the SDK will automatically retry several times with built-in exponential backoff. The idea behind exponential backoff is that it uses progressively longer waits between retries. 

If you’re still throttled after all of these retries, you’ll eventually get a ProvisionedThroughputExceededException. The best thing to do when you get this error is to reduce your request rate and use exponential backoff. 

If you’re constantly hitting the same partition over and over again - you might be running into this error often - which can be annoying. This means your primary key is extremely important - as most access patterns are going to be based on your primary key and you may not be able to efficiently query on other attributes. The goal is to have many smaller partitions instead of few larger partitions so that your queries are balanced across the nodes supporting DynamoDB. 

So while you might not care about the number of partitions you have or how DynamoDB partitions your data - you have to care about how to split up your data so you don’t end up in a situation where your read and write traffic is not evenly distributed across your system. 

However, there’s a few things DynamoDB does to help manage imbalances across your system. There’s also a few things you can do to better manage this as well. 

So let’s talk about what DynamoDB does for you.

The first thing it provides is a bit of burst capacity for free. This helps with short, mild spikes in traffic. Basically, DynamoDB reserves unused capacity for partitions. When traffic spikes for a partition, it uses these extra capacity units, allowing you to go above the throughput of that partition for a short period of time.

However, DynamoDB found that burst capacity wasn’t enough for some customer’s workloads. So, they started offering adaptive capacity.

With adaptive capacity, DynamoDB redirects unused capacity from other partitions to boost the hot partition, enabling it to consume more RCU’s and WCU’s indefinitely, enabling your workload to remain imbalanced and still operate.

Besides using retries and exponential backoffs, there are some bigger structural changes you can make as well to ease some of the burden on certain partitions. 

The first thing you can do is to over provision your table's capacity. The downside of this, of course, is cost and unused capacity. 

If your workload is read heavy, the second thing you can do is consider using DAX or DynamoDB Accelerator. By using DAX, you can alleviate some of the burden on your DynamoDB tables by caching your reads - so they won’t be consuming RCUs from your table or from any particular partition. 

But really, the best thing you can do is have a good partition key strategy to keep your partitions relatively small and balanced. 

In summary, as your tables scale, it’s imperative to choose a high cardinality partition key that keeps partitions relatively small so DynamoDB can balance these partitions across the nodes supporting the database. Adaptive capacity has come a long way with helping with imbalances as you scale, but having a good primary key strategy is still vital. That being said, DynamoDB is constantly improving how it manages imbalanced partitions and I can only assume that there will be greater innovations down the road with flow control. That’s it for this one - see you next time.

About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.