image
Course Summary
Start course
Difficulty
Beginner
Duration
1h 2m
Students
17347
Ratings
4.6/5
starstarstarstarstar-half
Description

This is the second course in a two-part series on database fundamentals for AWS. This course explores four different AWS database services — Amazon Redshift, Amazon QLDB, Amazon DocumentDB, and Amazon Keyspaces — and the differences between them. As well as getting a theoretical understanding of these, you will also watch guided demonstrations from the AWS platform showing you how to use each database service.

If you have any feedback relating to this course, please feel free to share your thoughts with us at support@cloudacademy.com. The first course in this two-part series covers Amazon RDS, Amazon DynamoDB, Amazon ElastiCache, and Amazon Neptune. If you're looking for more information on these AWS database services, you can find that course here.

Learning Objectives

  • Obtain a solid understanding of the following Amazon database services: Redshift, Quantum Ledger Database (QLDB), DocumentDB, and Keyspaces.
  • Create an Amazon Redshift cluster
  • Create a ledger using Amazon QLDB
  • Create an Amazon DocumentDB cluster
  • Create a keyspace and table in Amazon Keyspaces for Apache Cassandra

Intended Audience

  • Individuals responsible for designing, operating, and optimizing AWS database solutions
  • Anyone preparing to take the AWS Certified Database Specialty exam

Prerequisites

To get the most out of this course, you should have a basic understanding of database architectures and the AWS global infrastructure. For more information on this, please see our existing blog post here. You should also have a general understanding of the principles behind different EC2 Instance families.

Transcript

Hello and welcome to the final lecture of this two part course series. In this course, we focused on learning some of the base foundational concepts and features of the following AWS database services: Amazon Redshift, Amazon QLDB, Amazon DocumentDB, and Amazon Keyspaces.

I started off by looking at Amazon Redshift and during the lecture, we covered the following key points:

Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse, and is designed for high performance and analysis of information, capable of storing and processing petabytes of data Access to this data is generally provided using your existing business intelligence tools using standard SQL.

Redshift itself is based upon PostgreSQL 8.0.2, but it contains a number of differences from PostgreSQL. A data warehouse is a very effective way to manage your reporting and data analysis needs at scale. And Extract, Transform, and Load, or ETL jobs, can be carried out on the data. Extraction is the process of retrieving data from one or more sources. Transformation is the process of mapping, reformatting, conforming, adding meaning, and more to prepare the data in a way that is more easily consumed.

Loading involves successfully inserting the transformed data into the target database, data store, or in this case, a data warehouse. A Redshift cluster can be considered the core component. A cluster is effectively a grouping of compute nodes. And each cluster contains at least one compute node. A leader node will be provisioned if there is more than one compute node. Compute nodes all contain their own quantity of CPU, attached storage and memory. Leader nodes coordinate communication between your compute nodes in your cluster and your external applications. The leader node creates execution plans containing code to return results from the database.

Queries referencing tables associated with compute nodes will be distributed to the corresponding compute nodes to obtain the required data, which is then sent to the leader node. A node slice is simply a partition of a compute node, where the nodes memory and disk space is split. Each node slice then processes operations given by the leader node, allowing parallel operations to be performed.

Communication with your Business Intelligence tools will use industry-standard ODBC and JDBC drivers. And Redshift supports the following performance features, massively parallel processing, columnar data storage, and result caching. Amazon Redshift also integrates with Amazon CloudWatch, allowing you to monitor the performance of your physical resources, such as CPU utilization and throughput.

Following Amazon Redshift, the next service I looked at was Amazon QLDB, and during this lecture, we learned that Amazon QLDB is a fully managed and serverless database service, which has been designed as a ledger database. It provides an immutable, transparent, and cryptographic way of maintaining a ledger database. It is configured as an append-only database. QLDB is owned and managed by a central and trusted authority. Being a serverless service, scaling is managed by AWS, which includes any read and write limitations of the database.

Use cases of QLDB include the insurance industry, using an immutable append-only framework prevents the ability to manipulate previous data claim entries, which helps to prevent fraudulent activity. And human resources, having a clearly defined verifiable employment history, that can be encrypted, trusted, and reliable, containing all elements of the individual held centrally makes Amazon QLDB a great fit.

Amazon QLDB is really about maintaining an immutable ledger with cryptographic abilities to enable the verifiable tracking of changes over time. And data for your QLDB database is placed into tables of Amazon Ion documents. And ion documents are an open-source self-describing data serialization format, which is a superset of JSON, allowing you to store both structured and unstructured data. And JSON documents can also be classed as a valid Amazon Ion document.

Tables are comprised of a group of Amazon Ion documents and their revisions. And QLDB by design maintains an audit history of all changes in addition to all previous revisions of the same Ion document. This creates a journal of transactional changes. And the journal acts as an append-only transactional log and maintains the source of truth for that document and the entire history of changes to that document, ensuring that it remains immutable.

Amazon QLDB uses Journal Storage and Indexed Storage. And Journal storage is the storage that is used to hold the history of changes made within the ledger database. And Indexed storage is used to provision the tables and indexes within your ledger database and is optimized for querying. Amazon QLDB integrates with Amazon Kinesis through the use of QLDB Streams. And QLDB streams capture all changes that are made the journal which then feeds into an Amazon Kineses data stream in near real-time.

Now the third service we looked at in this course was Amazon DocumentDB with MongoDB Compatibility. And the key points from this lecture were as follows: Amazon DocumentDB runs in a Virtual Private Cloud, or VPC. It is a non-relational fully managed service, which again is highly scalable, very fast, and much like many other AWS services conforms to levels maintaining high availability. DocumentDB as the name implies is a document database, which provides the ability to quickly and easily store any JSON-like document, which can then be queried and indexed. Indexing enhances the speed of retrieving data within a database. And DocumentDB can scale both compute and storage independently from each other.

Storage will automatically increase the size by 10 gig each time, up to a maximum of 64 terabytes. Amazon DocumentDB has full compatibility with MongoDB, which again is another document database. Migrating existing MongoDB database to DocumentDB is a simple process using the AWS Database Migration Service. MongoDB compatibility means you don't have to update any of your code in your applications or modify any toolsets that you are using.

The AWS Database Migration Service allows you to connect to a source database, read the source data, format the data for consumption by a target database, and then load the data into that target database. The architecture of Amazon DocumentDB is similar to Amazon Neptune in many ways. The core component of the database is the cluster. And the cluster is compromised of a single or multiple database instances, up to 16 in total: one primary and up to 15 read replicas.

Clusters can span across different availability zones within a single region. And within the cluster is a shared cluster storage volume that supports every instance within the cluster. There will only ever be a single primary database instance performing write operations in the cluster at any one time. And read replicas reduce the load on the primary database instances by processing read requests from clients. DocumentDB is able to process a very high-volume of read requests with read replicas, any read query results served by the replicas have minimal lag, typically down to single-digit milliseconds.

DocumentDB uses endpoints to connect to different components of your database. And there are three different types of DocumentDB endpoints, these being: the Cluster endpoint, which is associated with the current primary DB instance of the cluster; the Reader endpoint, which allows connectivity to any read replicas that you have configured within the region; and the Instance endpoint, and each instance in the cluster will have its own unique instance endpoint that will point to itself. DocumentDB performs automatic backups. And a feature of these automatic backups allows you to restore back to any point in time during your retention period, known as point-in-time-recovery.

Now, the final service discussed covered the Amazon Keyspaces. And during this lecture I explained that Keyspaces is a serverless fully managed service, designed to be highly scalable, highly available. It is compatible with Apache Cassandra, meaning you can use all the same tools and code as you do currently with your existing Apache Cassandra database. Amazon Keyspaces boasts unlimited throughput. It is designed for massive-scale solutions, allowing you to service business-critical workloads requiring thousands of requests per second.

Using Amazon Keyspaces removes the need for you to manage typical Cassandra database nodes architectures. And it's a great choice if you are looking to build applications where low latency is essential or if you are looking for to manage your existing Cassandra databases in AWS without the burden of maintaining your own infrastructure. The Keyspace component in Amazon Keyspaces exists in their logical meaning only, rather than holding responsibility for us to manage replication as you would normally in a typical Keyspace Cassandra deployment.

Tables are where your database writes are stored. And every table will have a primary key that consists of a partition key and one or more columns. Encryption at rest is automatically enabled and TLS is required when communicating with any clients. Amazon Keyspaces offers two different throughput capacity modes: on-demand and provisioned, and Amazon Keyspaces uses CQL, which is the Cassandra Query Language, CQL is similar to SQL, the Structured Query Language, helping to reduce the learning curve when moving from a relational database using SQL, such as MySQL. And you can use CQL to run queries using the CQL editor, which returns as many as 1,000 records per query, the cqlsh client, or by running them programmatically using an Apache 2.0 licensed Cassandra client driver.

That now brings me to the end of this lecture and to the end of this course and this series. Across these two courses I covered the following Database services from a high level foundational and fundamentals perspective to provide you with an understanding of some of the differences between them. And that included Amazon RDS, Amazon DynamoDB, Amazon ElastiCache, Amazon Neptune, Amazon Redshift Amazon Quantum Ledger Database, QLDB, Amazon DocumentDB, and Amazon Keyspaces.

Feedback on our courses here at Cloud Academy is valuable to both us as trainers and any students looking to take the same course in the future. If you have any feedback, positive or negative, it would be greatly appreciated if you could contact support@cloudacademy.com. Thank you for your time and good luck with your continued learning of cloud computing. Thank you.

Lectures

Course Introduction - Amazon Redshift - DEMO: Creating an Amazon Redshift Cluster - Amazon Quantum Ledger Database (QLDB) - DEMO: Creating a Ledger using Amazon QLDB - Amazon DocumentDB (With MongoDB Compatibility) - DEMO: Creating an Amazon DocumentDB Cluster - Amazon Keyspaces (for Apache Cassandra) - DEMO: Creating a Keyspace and Table in Amazon Keyspaces (for Apache Cassandra)

About the Author
Students
228694
Labs
1
Courses
215
Learning Paths
178

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.