This is the second course in a two-part series on database fundamentals for AWS. This course explores four different AWS database services — Amazon Redshift, Amazon QLDB, Amazon DocumentDB, and Amazon Keyspaces — and the differences between them. As well as getting a theoretical understanding of these, you will also watch guided demonstrations from the AWS platform showing you how to use each database service.
If you have any feedback relating to this course, please feel free to share your thoughts with us at support@cloudacademy.com. The first course in this two-part series covers Amazon RDS, Amazon DynamoDB, Amazon ElastiCache, and Amazon Neptune. If you're looking for more information on these AWS database services, you can find that course here.
Learning Objectives
- Obtain a solid understanding of the following Amazon database services: Redshift, Quantum Ledger Database (QLDB), DocumentDB, and Keyspaces.
- Create an Amazon Redshift cluster
- Create a ledger using Amazon QLDB
- Create an Amazon DocumentDB cluster
- Create a keyspace and table in Amazon Keyspaces for Apache Cassandra
Intended Audience
- Individuals responsible for designing, operating, and optimizing AWS database solutions
- Anyone preparing to take the AWS Certified Database Specialty exam
Prerequisites
To get the most out of this course, you should have a basic understanding of database architectures and the AWS global infrastructure. For more information on this, please see our existing blog post here. You should also have a general understanding of the principles behind different EC2 Instance families.
Hello, and welcome to this short lecture, in which we'll look into the final database service of this course series, Amazon Keyspaces for Apache Cassandra. Firstly, let's answer the question that some people ask when seeing this service. What is Apache Cassandra? To summarize it quickly, Wikipedia explains that "Apache Cassandra is a free, open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure."
So now we have a high-level awareness of Amazon Cassandra. Let's see how Amazon Keyspaces fits into this. Keyspaces is a serverless, fully-managed service designed to be highly scalable, highly available, and importantly, compatible with Apache Cassandra, meaning you can use all the same tools and code as you do normally with your existing Apache Cassandra databases.
Being a serverless service. It removes the need for you to provision, patch, and manage instances yourself. Instead, all of this is taken care of by AWS on your behalf. Boasting unlimited throughput, Amazon Keyspaces is designed for massive scale solutions, allowing you to service business-critical workloads requiring thousands of requests per second. The key features of Amazon Keyspaces is that it can offer extreme performance, scalability, and elasticity, and grows at the rate of demand for your applications, ensuring you only pay for what you use.
Traditionally, Cassandra architectures are comprised of a cluster of nodes, which have to be created, provisioned, managed, patched, and backed up by you. As your Cassandra database grows, so does the amount of nodes, leading to greater administrative resources in managing the infrastructure. Using Amazon Keyspaces removes the need for you to manage this infrastructure, and instead you can focus on the business logic of the database and your applications that interact with it to ensure you are getting the best performance possible.
Amazon Keyspaces is a great choice if you're looking to build applications where low latency is essential, for example, route optimization applications or trade monitoring. And of course, if you're looking for an easier way of managing your existing Cassandra databases prices in the cloud without the burden of maintaining your own infrastructure.
To help understand the service in greater detail, let's look at some of the components of the service.
First let me explain the difference between keyspaces and tables. In Cassandra, a keyspace is essentially a grouping of tables that are related and are used by your applications to read and write data. Also, the keyspace in Cassandra also helps to define how your tables are replicated across multiple nodes in the cluster. However, because Amazon Keyspaces is fully managed and serverless, the entire storage layer is abstracted from being administered and configured by us as customers. Instead, it is managed by AWS. And so here, the keyspace component in Amazon Keyspaces exist in their logical meaning rather than holding the responsibility for us to manage any kind of replication.
Tables are where your database writes are stored, effectively, the data that is held within your database. In each table, there will be a primary key that consists of a partition key and one or more columns. When a new table is created, encryption at rest is automatically enabled, and any clients that want to connect to your tables will require a transport layer security connection for encrypted in transit connectivity.
In the next lecture, I will show you how to set up a keyspace and then a table that will reside within that keyspace. Much like Amazon DynamoDB, Keyspaces offers two different throughput capacity modes when working with your read and writes to and from your tables. These options allow you to customize how your throughput is managed, helping you to optimize it for your workloads.
The options available are on-demand and provisioned. On-demand throughput capacity is a default option when creating your tables and is capable of processing thousands of requests per second. The pricing for this option is based upon the number of read and writes made against your tables by your applications, meaning you only pay for what you're using.
As your workload fluctuates, it is able to scale to any increased throughput that the database has previously reached instantaneously. However, if additional throughput is required above and beyond existing thresholds, then Amazon Keyspaces works quickly to respond to meet the needs required by your applications.
As a result, this can be a good selection for your throughput if you're dealing with unknown or unpredictable workloads.
Provisioned throughput capacity is a better choice for you if you are dealing with more predictable workloads, which allows you to specify your predicted number of reads and writes per second, which would enable your tables to meet those throughput speeds faster than on-demand would. You can also use automatic scaling to alter the change of throughput if you experience fluctuation, or as your database naturally grows, using upper and lower the thresholds.
When working with Amazon Keyspaces, you'll need to use CQL, the Cassandra Query Language, which is the language you use to communicate with your Amazon Keyspaces. In many respects, it is similar to SQL, structured query language. And as a result, this helps to reduce the learning curve when moving from a relational database using SQL, such as MySQL.
There are a number of ways to run queries using CQL. Firstly, from within the Amazon Keyspaces dashboard within the AWS management console, you can use the CQL editor, which can return as many as a thousand records per query. If you are querying more than a thousand records, then you will need to run multiple queries together. You can run them on a CQLSH client, and more information on this can be found here, or you can run them programmatically using an Apache 2 licensed Cassandra client driver (more info here).
In the next lecture, I'll be demonstrating how to create a keyspace and then a table within that keyspace, so let's take a look.
Lectures
Course Introduction - Amazon Redshift - DEMO: Creating an Amazon Redshift Cluster - Amazon Quantum Ledger Database (QLDB) - DEMO: Creating a Ledger using Amazon QLDB - Amazon DocumentDB (With MongoDB Compatibility) - DEMO: Creating an Amazon DocumentDB Cluster - DEMO: Creating a Keyspace and Table in Amazon Keyspaces (for Apache Cassandra) - Course Summary
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.