Amazon Redshift is a fully managed petabyte-scale cloud data warehouse service offered by Amazon Web Services. It removes the overhead of months of efforts required in setting up the data warehouse and managing the hardware and software associated with it.
In this series of posts, we will be setting up a Redshift cluster, ingest some volume of data and play around with it. We will also take a look at some of the advanced options available such as understanding query plan to improve performance, workload management, cluster re-sizing, integration with other AWS Services.
Image courtesy: Amazon Web Services
Redshift based Cloud Data Warehouse Architecture
Let’s begin with a brief introduction of the Redshift architecture.
- Leader Node – the leader node parses the query, develops the query execution plan and distributes it to the compute nodes. The Leader Node is provisioned automatically by the service and is not billed
- Compute Node – this is the node that stores data and executes the query. Each Compute Node has its down compute, memory and storage
- Client Applications – client applications can be the standard ETL, BI and analytics tools
- Internal Networking – All the nodes are internally connected through a 10g network enabling faster data transfer between the nodes. The compute nodes are also not exposed to client applications. Client applications always talk to the Leader Node.
Here are some key features of Amazon Redshift:
In row-wise database storage (typically used in OLTP databases), data blocks store values sequentially for consecutive columns that make up a single row. This works for OLTP applications where most transactions read/write most of the columns in a row. Amazon Redshift employs columnar storage where data blocks store values of a single column of multiple rows. This means that reading the same number of column field values for the same number of records requires less I/O operations when compared to row-wise storage. This provides increased I/O performance and savings in storage space.
Redshift employs a Massively Parallel Processing (MPP) architecture that can distribute SQL operations across all available resources (nodes) resulting in very high query performance. A Redshift cluster comprises of a Leader Node automatically provisioned whenever there is more than one compute node. The leader node parses and develops execution plans to carry out database operations, in particular, the series of steps necessary to obtain results for complex queries. The leader node compiles code for individual elements of the execution plan and assigns the code to individual compute nodes. The compute nodes execute the compiled code and send intermediate results back to the leader node for final aggregation.
The number of nodes in a Redshift cluster can be dynamically changed through the AWS Management Console or the API. We can add more nodes to the cluster for increased performance or if we need more storage. We can start with a single 160GB DW2. Large node and scale all the way up to a petabyte. During the scaling activity, the cluster is placed in a read-only mode and all the data is copied to a new cluster. Once the new cluster is fully operational, the old cluster is terminated and this process is entirely transparent to the clients. During this activity, the query performance can be slower.
Data stored in Redshift is automatically (by default) compressed. Compressed data reduce disk usage and data is uncompressed after loading it into memory during query execution. Since Redshift employs columnar storage, Redshift can apply appropriate compression encodings that are tied to the column type.
Redshift comes with loads of security features including:
- Virtual Private Cloud: You can launch Redshift within VPC and control access to the cluster through the virtual networking environment
- Encryption: Data stored in Redshift can be encrypted. This can be configured when creating the tables in Redshift
- SSL: To encrypt connections between clients and Redshift, SSL encryption can be used
- Data in transit encryption: Redshift uses hardware accelerated SSL while connecting to Amazon S3 or DynamoDB (during import, export, backup)
From backups to monitoring to applying patches to upgrades, Redshift is fully managed by AWS. Data stored in Redshift is replicated in all the cluster nodes and automatically backed up as Snapshots and stored (for a user-defined time period) in S3. Redshift continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary.
New Content: Azure DP-100 Certification, Alibaba Cloud Certified Associate Prep, 13 Security Labs, and Much More
This past month our Content Team served up a heaping spoonful of new and updated content. Not only did our experts release the brand new Azure DP-100 Certification Learning Path, but they also created 18 new hands-on labs — and so much more! New content on Cloud Academy At any time, y...
AWS Certification Practice Exam: What to Expect from Test Questions
If you’re building applications on the AWS cloud or looking to get started in cloud computing, certification is a way to build deep knowledge in key services unique to the AWS platform. AWS currently offers 12 certifications that cover major cloud roles including Solutions Architect, De...
Overcoming Unprecedented Business Challenges with AWS
From auto-scaling applications with high availability to video conferencing that’s used by everyone, every day — cloud technology has never been more popular or in-demand. But what does this mean for experienced cloud professionals and the challenges they face as they carve out a new p...
Constant Content: Cloud Academy’s Q3 2020 Roadmap
Hello — Andy Larkin here, VP of Content at Cloud Academy. I am pleased to release our roadmap for the next three months of 2020 — August through October. Let me walk you through the content we have planned for you and how this content can help you gain skills, get certified, and...
New Content: Alibaba, Azure AZ-303 and AZ-304, Site Reliability Engineering (SRE) Foundation, Python 3 Programming, 16 Hands-on Labs, and Much More
This month our Content Team did an amazing job at publishing and updating a ton of new content. Not only did our experts release the brand new AZ-303 and AZ-304 Certification Learning Paths, but they also created 16 new hands-on labs — and so much more! New content on Cloud Academy At...
Blog Digest: Which Certifications Should I Get?, The 12 Microsoft Azure Certifications, 6 Ways to Prevent a Data Breach, and More
This month, we were excited to announce that Cloud Academy was recognized in the G2 Summer 2020 reports! These reports highlight the top-rated solutions in the industry, as chosen by the source that matters most: customers. We're grateful to have been nominated as a High Performer in se...
Which Certifications Should I Get?
The old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and companies. With all that in mind, the s...
New Content: AWS, Azure, Typescript, Java, Docker, 13 New Labs, and Much More
This month, our Content Team released a whopping 13 new labs in real cloud environments! If you haven't tried out our labs, you might not understand why we think that number is so impressive. Our labs are not “simulated” experiences — they are real cloud environments using accounts on A...
Kickstart Your Tech Training With a Free Week on Cloud Academy
Are you looking to make a jump in your technical career? Want to get trained or certified on AWS, Azure, Google Cloud Platform, DevOps, Kubernetes, Python, or another in-demand skill? Then you'll want to mark your calendar. Starting Monday, June 22 at 12:00 a.m. PDT (3:00 a.m. EDT), ...
New Content: AZ-500 and AZ-400 Updates, 3 Google Professional Exam Preps, Practical ML Learning Path, C# Programming, and More
This month, our Content Team released tons of new content and labs in real cloud environments. Not only that, but we introduced our very first highly interactive "Office Hours" webinar. This webinar, Acing the AWS Solutions Architect Associate Certification, started with a quick overvie...
Azure vs. AWS: Which Certification Provides the Brighter Future?
More and more companies are using cloud services, prompting more and more people to switch their current IT position to something cloud-related. The problem is most people only have that much time after work to learn new technologies, and there are plenty of cloud services that you can ...
Blog Digest: 5 Reasons to Get AWS Certified, OWASP Top 10, Getting Started with VPCs, Top 10 Soft Skills, and More
Thank you for being a valued member of our community! We recently sent out a short survey to understand what type of content you would like us to add to Cloud Academy, and we want to thank everyone who gave us their input. If you would like to complete the survey, it's not too late. It ...