This course covers how to scale Amazon Redshift, focusing on how it can scale storage and compute resources to meet demand.
Learning Objectives
- Learn about scaling Redshift Clusters
- Understand how to scale Redshift both horizontally and vertically
- Learn about the four ways a Redshift Cluster can be scaled
- Gain an understanding of classic resize and elastic resize
Intended Audience
This course is intended for anyone who wants to enhance their knowledge of Amazon Redshift, specifically how to resize clusters.
Prerequisites
To get the most out of this course, you should have some basic experience of Amazon Redshift.
Amazon Redshift Resize Operations Hello, my name is Stephen Cole and I'm a trainer here at Cloud Academy. Today, I am here to talk to you about scaling Amazon Redshift. Specifically, I want to teach you how Amazon Redshift can scale storage and compute resources to meet demand. When finished with this short course, you will learn about scaling Redshift Clusters. You will learn that it is possible to scale Redshift both horizontally and vertically. That there are four ways a Redshift Cluster can scale. You will gain an understanding of how two of these four ways work; Classic Resize and Elastic Resize. As part of this understanding, you will know when it is appropriate to use them, what happens--in general--when a cluster is resized, and how to check the status of a resize operation. That's a lot, so let's get started.
There are four primary ways Redshift can scale. They are the Classic Resize, Elastic Resize, Concurrency Scaling, and using Redshift Spectrum. There is a fifth way but it's not automatic. If needed, it's possible to take a snapshot of a Redshift cluster and use it to create a new cluster. While there are use cases for such actions, it is far easier to use the tools AWS provides to right-size Amazon Redshift clusters.
My focus, in this course, is on the Classic and Elastic Resize operations. Amazon Redshift clusters are elastic, cloud-based data warehouses that can be scaled both horizontally and vertically. Elasticity is important because charges are based on consumption. Over-provision and money is wasted. Under-provision and end-users are waiting and wasting their time. Often, that wasted time is also wasted money and lost opportunity. Scaling, then, is--in part--an exercise in finding the best size-to-cost ratio. Not only that, it's a never-ending process that is part of nearly every cloud-based service. It's hard to know how workloads will change over time. For that matter, it can be difficult to know how much capacity is needed when a new resource is provisioned.
Being elastic is finding the appropriate balance of computing resources that matches the demand and price point. I've found, throughout my career, that people often think of elasticity as being able to grow with demand and forget that it is also about shrinking as demand declines. Elasticity, then, is about balance. Most of the charges incurred using the AWS cloud are based on consumption. This means it's important to turn resources off when they're not being used.
Vertical scaling is the process of making a Redshift cluster either larger or smaller. Horizontal scaling, in contrast, involves adding resources to Redshift clusters to address expected temporary spikes in utilization with minimal disruption. There are two features of Redshift that act as scaling operations without changing the size or shape of the cluster.
While these topics are outside the scope of this course, I want to take a moment to make you aware of them. They are Concurrency Scaling and Amazon Redshift Spectrum. The Concurrency Scaling feature temporarily adds more compute power--on demand--to process queries. Amazon Redshift Spectrum allows data stored in Amazon S3 to be queried by the Redshift cluster using standard SQL.
Natively, Redshift can currently store and process up to two petabytes of data. Spectrum allows Redshift to raise that data limit and can query exabytes of data. An exabyte is one thousand petabytes. It's hard to visualize that much data. If a gigabyte is the size of the earth, Jupiter is 1.3 terabytes. A petabyte is almost 770 Jupiters. An exabyte is the size of the sun.
As capacity and performance needs change or grow, an Amazon Redshift cluster can be scaled to make the best use of the available computing and storage options. A cluster can be scaled in or out by adding or removing nodes. Scaling a cluster up or down involves changing the node type. This means more nodes can be added, the node type can be changed, a single node cluster can be changed into a multi-node cluster, and a multi-node cluster can be changed to a single-node cluster. The resulting cluster must be large enough to store the data in the existing cluster or the resize will fail.
Depending on the node configuration, AWS provides two ways to resize a Redshift cluster; the Classic Resize and the Elastic Resize. The classic resize copies tables and metadata to the new cluster; it won't retain system tables and records marked for deletion. If you have enabled audit logging in a cluster, access to those logs in remains available inside Amazon S3 after the source cluster has been deleted.
Unlike the Classic resize, the Elastic resize retains system log tables. The classic resize operation needs to know two things; the new node type and the number of nodes required. Once chosen and applied, data is copied in parallel from the compute node or nodes in the source cluster to the compute node or nodes in the target cluster.
Redshift first takes an incremental snapshot of the cluster. Then, it provisions an entirely new cluster using data from the snapshot. Once provisioned, Redshift will move the endpoint from the old cluster to the new one. The time that it takes to resize depends on the amount of data and the number of nodes in the smaller cluster.
There are some interesting things about the classic resize operation. The cluster is put into read-only mode. This means it is available for read queries but not writes while being resized. The classic resize can be canceled up until the time the endpoint has been migrated from the original cluster. Then, because the operation does not move data marked for deletion to the new cluster, the process does the equivalent of a VACUUM DELETE operation. If you're new to Amazon Redshift, you might not be aware of how data is deleted in a cluster.
Disk operations can be expensive and, because of this, Amazon Redshift does not automatically reclaim free space when a record is deleted or updated. This is a design choice that was inherited from Postgres. When the DELETE command is run, a row inside the column is marked as deleted but not automatically removed. Similarly, when the UPDATE command is run, the old row is marked as deleted and new data is appended. This increases the amount of table storage space and degrades the performance of a cluster over time.
To reclaim disk space on a Redshift cluster, a VACUUM operation must be performed. There are several types of VACUUM processes in Redshift but, in the case of the classic resize, the operation performed is pseudo VACUUM DELETE. I say pseudo because the Classic Resize doesn't really run the VACUUM DELETE process. It just ignores records marked as deleted while copying blocks to the new cluster. It copies them without sorting or reindexing the data.
Since the resize operation uses a snapshot to create the new cluster, it maintains the existing sort order of the cluster. This means that data is no more or no less sorted than it was prior to the resize operation. System tables are not moved to the new cluster. As a result, no historical data will be available to the new cluster. Even though the system tables are not moved, table statistics such as the size of the table, the distribution style of the table, and sort keys are gathered automatically in the resize process.
These statistics are important because they are used by Redshift to create query plans. Query Plans outline what operations the Redshift execution engine will perform, what each step does, what tables and columns are used, how much data is processed, and the relative cost of the total query. The relative cost has nothing to do with calculating the charges of a query. Its purpose is to provide an indication of which operations in a query consume the most resources and nothing else. It cannot be used to predict usage changes or even compare costs between different queries.
The time to complete a classic resize operation can vary. It can range from hours to days based on several factors, including: The workload on the source cluster. The number and size of the tables being transferred. How evenly data is distributed across the compute nodes and slices. The node configuration in the source and target clusters.
Since the Redshift Cluster is available for reads during the resize, there's an interesting performance optimization built into the process. The first blocks moved to the new cluster are the ones from the current queries. This provides them with a quick response time. In a way, I think of this like the Windchill factor in the Winter. Even though it isn't really colder outside, it feels like it is. Similarly, queries made after the resize operation has started can feel like they going faster because they are accessing the data that was just moved to the new cluster.
In 2018, AWS announced the Elastic Resize feature of Amazon Redshift. As with the Classic Resize, Elastic Resize can be used to change node types, the number of nodes, or both. AWS has said in various forums that the most common type of resize operation is to add or remove nodes to a cluster to address a spike in demand.
When adding nodes of the same type, instead of provisioning a new cluster that could take hours--or even days--it does something different. It adds additional nodes to the main cluster and automatically distributes data across the new configuration within minutes. While this is happening, queues are paused but, when possible, connections are held open. The time to complete a resize operation depends on multiple factors but, according to the AWS documentation, typical resize operations range between 5 and 15 minutes.
There might be an increase in execution time for some queries while the data is being redistributed in the background. When an Elastic Resize is requested, Redshift starts the process by taking an incremental snapshot of the current cluster. This is similar to the Classic Resize. Redshift also checks to see if there will be space available for the existing data in the resized cluster. If not, the operation will fail and the cluster will remain as is.
There are situations where, even when adding additional nodes, there might not be enough available storage in the new cluster. It's odd because--when adding more nodes--it seems that there should be more space available. The data's distribution style impacts the size requirements of a cluster. This means that, while the volume of data remains unchanged, a resize operation could spread it out in such a way that it will no longer fit.
A quick review: Amazon Redshift has four distribution styles, AUTO, EVEN, KEY, and ALL. These are out of the scope of this course. However, they are critically important to understand when it comes to working with Amazon Redshift.
Back to the topic at hand. Similar to the Classic Resize operation, to minimize the impact on performance, the first blocks moved to the new nodes are from current queries. This makes the operation--again--feel as if it is running faster.
In addition to changing the number of nodes, Redshift's Elastic Resize can also be used to change the node type. When changing the node type with an Elastic Resize operation, it starts with taking a snapshot of the cluster. If you've noticed a pattern, it's true. Every resize operation starts with taking a snapshot of the cluster.
While data is being transferred to the new cluster, the cluster is temporarily unavailable for write operations but it is available for read queries. When the resize process nears completion Amazon Redshift updates the endpoint of the new cluster and all connections to the original cluster are terminated.
Both the Classic and Elastic resize operations have appropriate use cases as well as some limitations. The Elastic Resize operation cannot be used to double a cluster's size. Also, it cannot be used to cut the size of a cluster in half. For these two operations use the Classic Resize. The Classic Resize is the best choice when making a more-or-less permanent change to the size of a cluster.
Elastic resize is best for those times when a temporary spike in demand is expected. If month-end or quarter-end reporting needs are going to temporarily add load on a Redshift cluster, use Elastic Resize to add--and then later remove--capacity. Another limitation of Elastic resize is that it cannot be performed on a single-node cluster.
When using Elastic Resize, disk space is not reclaimed the way it is with a Classic Resize. During both types of resize operations, tables are not sorted. Instead, Redshift moves data to the new compute nodes based on their distribution style and then runs the SQL command, ANALYZE, to update table statistics for the query planner.
Both the Classic and Elastic resize operations can be triggered from the AWS Console, the Redshift API, or using the command line interface. From the CLI, the command is resize-cluster. Here are all of the options. The only required option is the cluster-identifier. The cluster-type is either single-node or multi-node. If multi-node is chosen, the option number-of-nodes is required. If the node-type is omitted, the cluster's current node type is used. There is a shortlist of valid node types. Consult the documentation to see what's currently available.
If the classic or no-classic option is omitted, it will default to the Elastic resize type. I'm not sure why AWS chose classic and no-classic as the options. Personally, I like describing things as they are and would have preferred to have an option called. However, they didn't call me. They never do. I just wait by the phone hoping that, one day, someone at AWS will call me and ask me for my advice about naming things. Or, maybe not. Still, a person can dream, right?
It's possible to create a file that has these options in it. It can be formatted as either JSON or in YAML. Since creating these files can be tricky, some AWS commands have the option generate-cli-skeleton. This outputs a sample file that can be saved and edited that is formatted correctly. By itself, this command will generate JSON output. However, adding yaml-input to this option will create a skeleton file in YAML. The status of the resize is visible in the console, programmatically, or using the command line.
From the command line, the redshift option is describe-resize. aws redshift describe-resize --cluster-identifier cluster-name Just be sure to give it the name of the cluster. Otherwise, you'll get an error. It will return a result like this. The status will be in one of five possible states. NONE IN_PROGRESS FAILED SUCCEEDED CANCELLING
The TargetClusterType will be either single-node or multi-node. There is other data that can be returned as well. Check the documentation for the full list.
A couple of interesting and/or notable outputs include the names of tables that have been completely imported, the amount of data that's been processed, and the estimated time to completion. Again, see the documentation for the full list.
There you have it; information about resizing Amazon Redshift clusters. You now know it's possible to scale Redshift both horizontally and vertically. Horizontal scaling is accomplished by either upgrading or downgrading the individual nodes. Vertical scaling is done by adding or removing nodes. There are four ways a Redshift Cluster can automatically scale. Automatically, in this sense, means that AWS performs the steps for you.
Two of these methods were not addressed in this course, Concurrency Scaling and Amazon Redshift Spectrum. The other two, Classic Resize and Elastic Resize, are used to change the size and shape of the cluster depending on the need and use case. The Elastic Resize operation is best used to address expected spikes in demand.
The Classic Resize operation is best for those operations that change the size and shape of the Redshift cluster. This includes operations that would either cut the size of the cluster in half or double it. Also, the Classic resize is for changes that will be--more or less--permanent.
Finally, I covered how to check the status of a resize operation. I'm going to take a moment to give you a general piece of advice about working with AWS. The more you use the command line interface, the more comfortable you will be using it. That comfort comes with a prize inside, efficiency. While I can't scale myself horizontally or vertically, I can make myself more productive and effective. It will work for you too.
For Cloud Academy, I'm Stephen Cole. Enjoy your cloud journey. I know you have the power to change the world.
Stephen is the AWS Certification Specialist at Cloud Academy. His content focuses heavily on topics related to certification on Amazon Web Services technologies. He loves teaching and believes that there are no shortcuts to certification but it is possible to find the right path and course of study.
Stephen has worked in IT for over 25 years in roles ranging from tech support to systems engineering. At one point, he taught computer network technology at a community college in Washington state.
Before coming to Cloud Academy, Stephen worked as a trainer and curriculum developer at AWS and brings a wealth of knowledge and experience in cloud technologies.
In his spare time, Stephen enjoys reading, sudoku, gaming, and modern square dancing.