Skip to main content

Cloud Data Warehouse With AWS Redshift

Amazon Redshift is a fully managed petabyte-scale cloud data warehouse service offered by Amazon Web Services. It removes the overhead of months of efforts required in setting up the data warehouse and managing the hardware and software associated with it.

In this series of posts, we will be setting up a Redshift cluster, ingest some volume of data and play around with it. We will also take a look at some of the advanced options available such as understanding query plan to improve performance, workload management, cluster re-sizing, integration with other AWS Services.

Cloud Data Warehouse with Redshift
Image courtesy: Amazon Web Services

Redshift based Cloud Data Warehouse Architecture

Let’s begin with a brief introduction of the Redshift architecture.

  • Leader Node – the leader node parses the query, develops the query execution plan and distributes it to the compute nodes. The Leader Node is provisioned automatically by the service and is not billed
  • Compute Node – this is the node that stores data and executes the query. Each Compute Node has its down compute, memory and storage
  • Client Applications – client applications can be the standard ETL, BI  and analytics tools
  • Internal Networking – All the nodes are internally connected through a 10g network enabling faster data transfer between the nodes. The compute nodes are also not exposed to client applications. Client applications always talk to the Leader Node.

Here are some key features of Amazon Redshift:

Columnar Storage

In row-wise database storage (typically used in OLTP databases), data blocks store values sequentially for consecutive columns that make up a single row. This works for OLTP applications where most transactions read/write most of the columns in a row. Amazon Redshift employs columnar storage where data blocks store values of a single column of multiple rows. This means that reading the same number of column field values for the same number of records requires less I/O operations when compared to row-wise storage. This provides increased I/O performance and savings in storage space.

MPP Architecture

Redshift employs a Massively Parallel Processing (MPP) architecture that can distribute SQL operations across all available resources (nodes) resulting in very high query performance. A Redshift cluster comprises of a Leader Node automatically provisioned whenever there is more than one compute node. The leader node parses and develops execution plans to carry out database operations, in particular, the series of steps necessary to obtain results for complex queries. The leader node compiles code for individual elements of the execution plan and assigns the code to individual compute nodes. The compute nodes execute the compiled code and send intermediate results back to the leader node for final aggregation.

Scalable

The number of nodes in a Redshift cluster can be dynamically changed through the AWS Management Console or the API. We can add more nodes to the cluster for increased performance or if we need more storage. We can start with a single 160GB DW2. Large node and scale all the way up to a petabyte. During the scaling activity, the cluster is placed in a read-only mode and all the data is copied to a new cluster. Once the new cluster is fully operational, the old cluster is terminated and this process is entirely transparent to the clients. During this activity, the query performance can be slower.

Compression

Data stored in Redshift is automatically (by default) compressed. Compressed data reduce disk usage and data is uncompressed after loading it into memory during query execution. Since Redshift employs columnar storage, Redshift can apply appropriate compression encodings that are tied to the column type.

Security

Redshift comes with loads of security features including:

  • Virtual Private Cloud: You can launch Redshift within VPC and control access to the cluster through the virtual networking environment
  • Encryption: Data stored in Redshift can be encrypted. This can be configured when creating the tables in Redshift
  • SSL: To encrypt connections between clients and Redshift, SSL encryption can be used
  • Data in transit encryption: Redshift uses hardware accelerated SSL while connecting to Amazon S3 or DynamoDB (during import, export, backup)

Fully Managed

From backups to monitoring to applying patches to upgrades, Redshift is fully managed by AWS. Data stored in Redshift is replicated in all the cluster nodes and automatically backed up as Snapshots and stored (for a user-defined time period) in S3.  Redshift continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary.

Avatar

Written by

47Line Technologies

47Line is building solutions solving critical business problems using “cloud as the backbone”. The team has been working in Cloud Computing domain for last 6 years and have proven thought leadership in Cloud, Big Data technologies.

Related Posts

Jeff Hyatt
Jeff Hyatt
— June 18, 2019

10 Steps for an Effective Reserved Instances Strategy

Amazon Web Services (AWS) offers three different ways to pay for EC2 Instances: On-Demand, Reserved Instances, and Spot Instances. This article will focus on effective strategies for purchasing Reserved Instances. While most of the major cloud platforms offer pre-pay and reservation dis...

Read more
  • AWS
  • EC2
Joe Nemer
Joe Nemer
— June 18, 2019

AWS Certification Practice Exam: What to Expect from Test Questions

If you’re building applications on the AWS cloud or looking to get started in cloud computing, certification is a way to build deep knowledge in key services unique to the AWS platform. AWS currently offers 11 certifications that cover major cloud roles including Solutions Architect, De...

Read more
  • AWS
  • AWS Certifications
Avatar
John Chell
— June 13, 2019

AWS Certified Solutions Architect Associate: A Study Guide

The AWS Solutions Architect - Associate Certification (or Sol Arch Associate for short) offers some clear benefits: Increases marketability to employers Provides solid credentials in a growing industry (with projected growth of as much as 70 percent in five years) Market anal...

Read more
  • AWS
  • AWS Certifications
Chris Gambino and Joe Niemiec
Chris Gambino and Joe Niemiec
— June 11, 2019

Moving Data to S3 with Apache NiFi

Moving data to the cloud is one of the cornerstones of any cloud migration. Apache NiFi is an open source tool that enables you to easily move and process data using a graphical user interface (GUI).  In this blog post, we will examine a simple way to move data to the cloud using NiFi c...

Read more
  • AWS
  • S3
Avatar
Chandan Patra
— June 11, 2019

Amazon DynamoDB: 10 Things You Should Know

Amazon DynamoDB is a managed NoSQL service with strong consistency and predictable performance that shields users from the complexities of manual setup.Whether or not you've actually used a NoSQL data store yourself, it's probably a good idea to make sure you fully understand the key ...

Read more
  • AWS
  • DynamoDB
Avatar
Andrew Larkin
— June 6, 2019

The 11 AWS Certifications: Which is Right for You and Your Team?

As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in cloud computing.As the market leader and most ma...

Read more
  • AWS
  • AWS Certifications
Sam Ghardashem
Sam Ghardashem
— May 15, 2019

Aviatrix Integration of a NextGen Firewall in AWS Transit Gateway

Learn how Aviatrix’s intelligent orchestration and control eliminates unwanted tradeoffs encountered when deploying Palo Alto Networks VM-Series Firewalls with AWS Transit Gateway.Deploying any next generation firewall in a public cloud environment is challenging, not because of the f...

Read more
  • AWS
Joe Nemer
Joe Nemer
— May 3, 2019

AWS Config Best Practices for Compliance

Use AWS Config the Right Way for Successful ComplianceIt’s well-known that AWS Config is a powerful service for monitoring all changes across your resources. As AWS Config has constantly evolved and improved over the years, it has transformed into a true powerhouse for monitoring your...

Read more
  • AWS
  • Compliance
Avatar
Francesca Vigliani
— April 30, 2019

Cloud Academy is Coming to the AWS Summits in Atlanta, London, and Chicago

Cloud Academy is a proud sponsor of the 2019 AWS Summits in Atlanta, London, and Chicago. We hope you plan to attend these free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. These events are all about learning. You can learn how t...

Read more
  • AWS
  • AWS Summits
Paul Hortop
Paul Hortop
— April 2, 2019

How to Monitor Your AWS Infrastructure

The AWS cloud platform has made it easier than ever to be flexible, efficient, and cost-effective. However, monitoring your AWS infrastructure is the key to getting all of these benefits. Realizing these benefits requires that you follow AWS best practices which constantly change as AWS...

Read more
  • AWS
  • Monitoring
Joe Nemer
Joe Nemer
— April 1, 2019

AWS EC2 Instance Types Explained

Amazon Web Services’ resource offerings are constantly changing, and staying on top of their evolution can be a challenge. Elastic Cloud Compute (EC2) instances are one of their core resource offerings, and they form the backbone of most cloud deployments. EC2 instances provide you with...

Read more
  • AWS
  • EC2
Avatar
Nitheesh Poojary
— March 26, 2019

How DNS Works – the Domain Name System (Part One)

Before migrating domains to Amazon's Route53, we should first make sure we properly understand how DNS worksWhile we'll get to AWS's Route53 Domain Name System (DNS) service in the second part of this series, I thought it would be helpful to first make sure that we properly understand...

Read more
  • AWS