The course is part of these learning paths
This course explores how to scale your RDS databases. It covers scaling based on reads or writes, and what it means to scale horizontally or vertically.
Additionally, it covers sharding databases as a way to increase write performance and when it needs to be considered as an option.
Learning Objectives
- Explain the difference between horizontal and vertical scaling of RDS databases.
- You will understand the differences between scaling for reads and scaling for writes.
- You will learn to address these scaling issues inside Amazon RDS
- You will learn when it is appropriate to shard an RDS database and understand its complexities
Intended Audience
This course is recommended for anyone who wants to understand the basics of scaling RDS. It includes information about database sharding as a way to scale write performance.
Prerequisites
To get the most out of this course, you should have a basic understanding of cloud computing using Amazon Web Services. You also need to know how relational databases work at a high level.
Feedback
If you have any feedback relating to this course, please contact us at support@cloudacademy.com.
One of the greatest benefits of using Amazon RDS is that it is an AWS managed service. This management covers patching, security updates, and other low level undifferentiated heavy lifting. Alleviating this burden helps to deal with many of the annoying aspects of scaling up or out your database.
Let begin by taking a look at how RDS scales vertically for both reads and writes
Scaling your RDS Database vertically is probably the simplest way to alleviate burdens on your ready or write throughput. Switching out the underlying instance for one with more CPU and RAM is literally just a button click away. You can scale vertically up to a maximum of 32 vCPUs and 244 GiB of RAM. Now it is important to know that this scaling does cause downtime for your database, but you can always schedule this around your normal maintenance windows.
In regards to scaling your RDS databases, it's important to note that the storage volume and the instance type are decoupled. Which means when you vertically scale up or down, your storage will stay the same size. However RDS does support storage autoscaling, which can alleviate this problem, otherwise you can set it yourself.
Vertical scaling is a fine answer to many throughput problems, but it won't be super cost effective if your issues are very read heavy or very write heavy. Since upgrading the hardware provides an increase in both of those dimensions, you only get half of the benefit.
With that in mind let's see what options are available for read heavy workloads.
RDS provides a fantastic way to increase your read throughputs without having to change the size of your underlying database instance. It does this by using a horizontal scaling method called read replicas.
A read replica is a copy of your database that gives the user another access point to retrieve data from. This helps to alleviate the bottleneck on your primary database. The read replica is kept in sync with the primary database and only allows its users to read data. If this wasn't the case, there would be synchronization issues and race conditions that are troublesome to deal with.
RDS creates a read replica by building a snapshot of your primary database instance, and creates a full read only database copy from it. You will experience a short I/O suspension that will last for about a minute, on your source database while this snapshot occurs.
Amazon RDS then uses the asynchronous replication method for the DB engine to update the read replica whenever there is a change to the source DB instance.
RDS allows you to create up to 5 read replicas for each DB instance. This is supported by Amazon RDS for MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server.
And if at any time your primary database was to go down, or become corrupted in some fashion, you have the ability to promote your read replica into a new primary database. Your traffic can migrate over to this copy using route 53 failover routing and health checks.
You need to use Amazon Route 53 weighted record sets to distribute requests across your read replicas. You do this by creating individual record sets for each DNS endpoint associated with your read replicas and giving them the same weight. Then, direct requests to the endpoint of the record set.
Scaling your rds database for writes can be very difficult. There isn't a simple built in way for RDS to improve the write throughput besides scaling your whole database vertically. However there is a technique called sharding that can be implemented to get around this.
Sharding is similar in a way to using read replicas, in that you create an additional database to share the load of the primary. This database however is a fully working version, that can both read and write. The catch however is that each database deals with different parts of your entire dataset.
For example: you could have 1 shard of your database that deals with all customers whose last name begins from a to m, and a second shard that deals with all customers from n to z.
Since they do not share any portion of the dataset in common, there are no worries about synchronicity. Additionally each of these shards have all the scaling capabilities we have already discussed. They too can have read replicas, as well as the ability to scale the underlying instances themselves.
When thinking about sharding your database, it's important to reiterate that RDS does not handle this natively. You need to deal with the logic from the application side on which database contains what you are looking for - and handling writing and reading from the appropriate one.
Sharding is something that should be considered on the onset of creating your architectures and database. One of the downsides to sharding is that you lose the ability to easily do joins on these separate datasets. You would specifically have to engineer that ability and that adds another layer of complexity.
There are many ways you can shard your database. It is extremely important that you design a solution that will work long term. Resharding your databases if they become overburdened again is also a possibility but adds more downtime and creates complexity.
Lectures
William Meadows is a passionately curious human currently living in the Bay Area in California. His career has included working with lasers, teaching teenagers how to code, and creating classes about cloud technology that are taught all over the world. His dedication to completing goals and helping others is what brings meaning to his life. In his free time, he enjoys reading Reddit, playing video games, and writing books.