The course is part of these learning paths
This is the second course in a two-part series on database fundamentals for AWS. This course explores four different AWS database services — Amazon Redshift, Amazon QLDB, Amazon DocumentDB, and Amazon Keyspaces — and the differences between them. As well as getting a theoretical understanding of these, you will also watch guided demonstrations from the AWS platform showing you how to use each database service.
If you have any feedback relating to this course, please feel free to share your thoughts with us at firstname.lastname@example.org. The first course in this two-part series covers Amazon RDS, Amazon DynamoDB, Amazon ElastiCache, and Amazon Neptune. If you're looking for more information on these AWS database services, you can find that course here.
- Obtain a solid understanding of the following Amazon database services: Redshift, Quantum Ledger Database (QLDB), DocumentDB, and Keyspaces.
- Create an Amazon Redshift cluster
- Create a ledger using Amazon QLDB
- Create an Amazon DocumentDB cluster
- Create a keyspace and table in Amazon Keyspaces for Apache Cassandra
- Individuals responsible for designing, operating, and optimizing AWS database solutions
- Anyone preparing to take the AWS Certified Database Specialty exam
To get the most out of this course, you should have a basic understanding of database architectures and the AWS global infrastructure. For more information on this, please see our existing blog post here. You should also have a general understanding of the principles behind different EC2 Instance families.
Hello and welcome to this lecture looking at Amazon DocumentDB. This database service runs in a Virtual Private Cloud and is a non-relational fully managed service, which again is highly scalable, very fast, and much like many other AWS services conforms to levels maintaining high availability. It will be of no surprise, as the name implies, that DocumentDB is a document database, which provides the ability to quickly and easily store any JSON-like document data which can then be queried and indexed. Indexing enhances the speed of retrieving data within a database thanks to an indexing data structure stored within the database.
The ability to scale within AWS is important across all services, and DocumentDB has the ability to scale both its compute and storage separately from each other. This decoupled approach creates a flexible scaling pattern allowing you to scale the resource as when you need to.
As your database grows in size, Amazon DocumentDB will automatically increase the size of your storage by 10G each time, up to a maximum of 64TB to ensure that you do not run out of storage space.
Amazon DocumentDB has full compatibility with MongoDB, which again is another document database, meaning that if required, you can easily migrate any existing MongoDB databases you might have into Amazon DocumentDB using the Database Migration Service . With this compatibility with MongoDB, it means you don't have to update any of your code in your applications or modify any toolsets that you are using, making this a simple transition in Amazon DocumentDB if you decide to migrate your database.
The AWS Database Migration Service allows you to connect to a source database, read the source data, format the data for consumption by a target database, and then load the data into that target database. The AWS Database Migration Service can migrate your data to and from commercial and open-source databases.
For more information on the AWS Database migration service, please see our existing course here.
Let me now explain the architecture of Amazon DocumentDB. Firstly, I want to cover the base architecture of the service, and If you are familiar with Amazon Neptune, then architecturally Amazon DocumentDB is similar in many ways.
The database itself is comprised of a core component, a cluster, and this cluster is composed of a single or multiple DB instances, up to 16 in total, which can span across different availability zones within a single region. Across this cluster is a shared cluster storage volume that supports every instance within the cluster, meaning that every instance sees the same storage volume.
The instances backing the DocumentDB cluster provide the processing power to support read and write requests against the cluster storage volume, and as I mentioned you can have up to 16 DB instances. There will only ever be a single primary DB instance performing write operations in the cluster at any one time. The remaining instances within the cluster if they have been configured will all be read replica instances, serving only read requests.
Read replicas, as we have seen in previous AWS database services in this course series helps to reduce the load on the primary database instances by processing read requests from clients. As a result, DocumentDB is able to process a very high-volume of these kinds of requests.
DocumentDB supports up to 15 read replicas across different availability zones within the same region, much like Amazon Neptune, and also shares the same underlying storage of the primary instance across a shared volume.
The Primary DocumentDB instance will be responsible for both read and write operations. However, the replicas will only process read requests to the cluster volume. As the replicas connect to the same source data as the primary, any read query results served by the replicas have minimal lag, typically down to single-digit milliseconds. Data is synchronized is maintained synchronously between both the primary DB instance and each replica in the region.
DocumentDB uses a principal of endpoints to connect to different components of your database. An endpoint is a URL address with an identified port that points to your infrastructure. There are three different types of DocumentDB endpoints, these being: Cluster endpoint, Reader endpoint, and Instance endpoint.
Cluster Endpoints: Each DocumentDB database will have a cluster endpoint that is associated with the current primary DB instance of the cluster. This endpoint should be used by any applications that require both read and write access to the database. Should a failure of your primary DB instance occur, then DocumentDB will promote either a read replica to the primary DB or if no read replica is configured, DocumentDB will create a new primary instance. When this happens, the cluster endpoint will then point to the new primary instance without any changes required by your applications accessing the database.
Reader Endpoints: A Reader endpoint allows connectivity to any read replicas that you have configured within the region. Applications can use this endpoint to access your database for read requests, typically when performing a query. Only a single reader endpoint will exist, even if you have multiple read replicas. as a result, DocumentDB will manage the forwarding of any read requests onto a specific read replica associated with the primary DB.
Instance endpoints: For every instance within your cluster, including your primary and read replica instances, they will each have their own unique instance endpoint that will point to its own host. This allows you to direct certain traffic to specific instances within the cluster, you might want to do this for load-balancing reasons across your applications reading from your replicas.
DocumentDB performs automatic backups for you based upon a schedule created during the creation of your database. A feature of these automatic backups allows you to restore back to any point in time during your retention period, known as point-in-time-recovery. As a part of the automated daily backup process, DocumentDB captures any transaction logs that have been created as and when updates to your Database were made, this is to ensure it can perform a point-in-time-recovery. These backups are automatically stored on Amazon S3 for durability and availability.
The automated backups themselves are performed daily, and the backup retention period determines how long DocumentDB will keep and maintain your backups for and can be set between 0 and 35 days. For automatic backups to take place, the retention period must be set to at least one day. If the retention is set to 0 then automatic backups will not take place and you will not be able to perform point-in-time restores. If you did have it configured, then your point-in-time restores can take place for any duration within the retention period.
The backup window allows you to define a time period in which the backup snapshot could be taken. This allows you to set it at a time when the database itself will be of low utilization to prevent the backup process impacting the performance of the database itself.
In the next lecture, I will be looking at how to create a new Document DBCluster, which will cover these backup settings in addition to further configurational changes, so let's move on and take a look.
Course Introduction - Amazon Redshift - DEMO: Creating an Amazon Redshift Cluster - Amazon Quantum Ledger Database (QLDB) - DEMO: Creating a Ledger using Amazon QLDB - - DEMO: Creating an Amazon DocumentDB Cluster - Amazon Keyspaces (for Apache Cassandra) - DEMO: Creating a Keyspace and Table in Amazon Keyspaces (for Apache Cassandra) - Course Summary
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.