image
Some Assembly Required
Start course
Difficulty
Intermediate
Duration
19m
Students
404
Ratings
5/5
starstarstarstarstar
Description

This course covers Amazon Redshift Spectrum, including what it is, what it does, how it works, and some points to take into consideration when using Redshift Spectrum.

Learning Objectives

  • How to manage cold data in Redshift using Amazon S3
  • What Amazon Redshift Spectrum is and does
  • How Spectrum Queries work
  • Supported data formats of Spectrum
  • File optimization using Spectrum
  • Amazon Redshift Spectrum Considerations

Intended Audience

This course is intended for people that want to learn more about Amazon Redshift Spectrum and how it can be used to perform SQL queries on data stored in Amazon S3.

Prerequisites

To get the most from this course, you should have a basic understanding of Amazon Redshift, Amazon Athena, AWS Glue, and data analytics concepts.

Transcript

Some Assembly Required. Redshift Spectrum is a feature of Amazon Redshift. It cannot be used as a standalone query engine. Also, the Redshift cluster and the data stored in S3 used by Spectrum nodes must be in the same AWS Region. The Redshift cluster needs to be authorized to access external Data Catalogs in AWS Glue or Amazon Athena. This is done using IAM Roles. For information about creating IAM Roles to use with Amazon Redshift, AWS Glue and Amazon Athena, please refer to the AWS documentation.

In order to query data with Spectrum, both an external schema and an external table must be created. The external schema references a database in the external data catalog. It also provides the IAM role, ARN, that authorizes a cluster to access Amazon S3. The external database can be created in an Amazon Athena Data Catalog, AWS Glue Data Catalog or an Apache Hive mega store such as Amazon EMR. Once the external tables are created, they can be queried inside Amazon Redshift using the same types of SQL statements used for other Redshift tables. I would like to mention one other thing about using the Data Catalog. Even though this is an external database, there is no need to run crawlers on the data.

About the Author
Students
32407
Courses
20
Learning Paths
14

Stephen is the AWS Certification Specialist at Cloud Academy. His content focuses heavily on topics related to certification on Amazon Web Services technologies. He loves teaching and believes that there are no shortcuts to certification but it is possible to find the right path and course of study.

Stephen has worked in IT for over 25 years in roles ranging from tech support to systems engineering. At one point, he taught computer network technology at a community college in Washington state.

Before coming to Cloud Academy, Stephen worked as a trainer and curriculum developer at AWS and brings a wealth of knowledge and experience in cloud technologies.

In his spare time, Stephen enjoys reading, sudoku, gaming, and modern square dancing.