Amazon DocumentDB (With MongoDB Compatibility)


Course Introduction
RDS vs. EC2
RDS vs. EC2
DynamoDB Accelerator

The course is part of this learning path

Amazon DocumentDB (With MongoDB Compatibility)
4h 21m

This section of the AWS Certified Solutions Architect - Professional learning path introduces you to the AWS database services relevant to the SAP-C02 exam. We then understand the service options available and learn how to select and apply AWS database services to meet specific design scenarios relevant to the AWS Certified Solutions Architect - Professional exam. 

Want more? Try a Lab Playground or do a Lab Challenge

Learning Objectives

  • Understand the various database services that can be used when building cloud solutions on AWS
  • Learn how to build databases using Amazon RDS, DynamoDB, Redshift, DocumentDB, Keyspaces, and QLDB
  • Learn how to create ElastiCache and Neptune clusters
  • Understand which AWS database service to choose based on your requirements
  • Discover how to use automation to deploy databases in AWS
  • Learn about data lakes and how to build a data lake in AWS

Hello and welcome to this lecture looking at Amazon DocumentDB. This database service runs in a Virtual Private Cloud and is a non-relational fully managed service, which again is highly scalable, very fast, and much like many other AWS services conforms to levels maintaining high availability. It will be of no surprise, as the name implies, that DocumentDB is a document database, which provides the ability to quickly and easily store any JSON-like document data which can then be queried and indexed. Indexing enhances the speed of retrieving data within a database thanks to an indexing data structure stored within the database.

The ability to scale within AWS is important across all services, and DocumentDB has the ability to scale both its compute and storage separately from each other. This decoupled approach creates a flexible scaling pattern allowing you to scale the resource as when you need to.

As your database grows in size, Amazon DocumentDB will automatically increase the size of your storage by 10G each time, up to a maximum of 64TB to ensure that you do not run out of storage space.

Amazon DocumentDB has full compatibility with MongoDB, which again is another document database, meaning that if required, you can easily migrate any existing MongoDB databases you might have into Amazon DocumentDB using the Database Migration Service . With this compatibility with MongoDB, it means you don't have to update any of your code in your applications or modify any toolsets that you are using, making this a simple transition in Amazon DocumentDB if you decide to migrate your database.

The AWS Database Migration Service allows you to connect to a source database, read the source data, format the data for consumption by a target database, and then load the data into that target database. The AWS Database Migration Service can migrate your data to and from commercial and open-source databases.

For more information on the AWS Database migration service, please see our existing course here.

Let me now explain the architecture of Amazon DocumentDB. Firstly, I want to cover the base architecture of the service, and If you are familiar with Amazon Neptune, then architecturally Amazon DocumentDB is similar in many ways.

The database itself is comprised of a core component, a cluster, and this cluster is composed of a single or multiple DB instances, up to 16 in total, which can span across different availability zones within a single region. Across this cluster is a shared cluster storage volume that supports every instance within the cluster, meaning that every instance sees the same storage volume.

The instances backing the DocumentDB cluster provide the processing power to support read and write requests against the cluster storage volume, and as I mentioned you can have up to 16 DB instances. There will only ever be a single primary DB instance performing write operations in the cluster at any one time. The remaining instances within the cluster if they have been configured will all be read replica instances, serving only read requests.

Read replicas, as we have seen in previous AWS database services in this course series helps to reduce the load on the primary database instances by processing read requests from clients. As a result, DocumentDB is able to process a very high-volume of these kinds of requests.

DocumentDB supports up to 15 read replicas across different availability zones within the same region, much like Amazon Neptune, and also shares the same underlying storage of the primary instance across a shared volume.

The Primary DocumentDB instance will be responsible for both read and write operations. However, the replicas will only process read requests to the cluster volume. As the replicas connect to the same source data as the primary, any read query results served by the replicas have minimal lag, typically down to single-digit milliseconds. Data is synchronized is maintained synchronously between both the primary DB instance and each replica in the region.

DocumentDB uses a principal of endpoints to connect to different components of your database. An endpoint is a URL address with an identified port that points to your infrastructure. There are three different types of DocumentDB endpoints, these being: Cluster endpoint, Reader endpoint, and Instance endpoint.

Cluster Endpoints: Each DocumentDB database will have a cluster endpoint that is associated with the current primary DB instance of the cluster. This endpoint should be used by any applications that require both read and write access to the database. Should a failure of your primary DB instance occur, then DocumentDB will promote either a read replica to the primary DB or if no read replica is configured, DocumentDB will create a new primary instance. When this happens, the cluster endpoint will then point to the new primary instance without any changes required by your applications accessing the database.

Reader Endpoints: A Reader endpoint allows connectivity to any read replicas that you have configured within the region. Applications can use this endpoint to access your database for read requests, typically when performing a query. Only a single reader endpoint will exist, even if you have multiple read replicas. as a result, DocumentDB will manage the forwarding of any read requests onto a specific read replica associated with the primary DB.

Instance endpoints: For every instance within your cluster, including your primary and read replica instances, they will each have their own unique instance endpoint that will point to its own host. This allows you to direct certain traffic to specific instances within the cluster, you might want to do this for load-balancing reasons across your applications reading from your replicas.

DocumentDB performs automatic backups for you based upon a schedule created during the creation of your database. A feature of these automatic backups allows you to restore back to any point in time during your retention period, known as point-in-time-recovery. As a part of the automated daily backup process, DocumentDB captures any transaction logs that have been created as and when updates to your Database were made, this is to ensure it can perform a point-in-time-recovery. These backups are automatically stored on Amazon S3 for durability and availability.

The automated backups themselves are performed daily, and the backup retention period determines how long DocumentDB will keep and maintain your backups for and can be set between 0 and 35 days. For automatic backups to take place, the retention period must be set to at least one day. If the retention is set to 0 then automatic backups will not take place and you will not be able to perform point-in-time restores. If you did have it configured, then your point-in-time restores can take place for any duration within the retention period.

The backup window allows you to define a time period in which the backup snapshot could be taken. This allows you to set it at a time when the database itself will be of low utilization to prevent the backup process impacting the performance of the database itself.

In the next lecture, I will be looking at how to create a new Document DBCluster, which will cover these backup settings in addition to further configurational changes, so let's move on and take a look.


Course Introduction - Amazon Redshift - DEMO: Creating an Amazon Redshift Cluster - Amazon Quantum Ledger Database (QLDB) - DEMO: Creating a Ledger using Amazon QLDB - - DEMO: Creating an Amazon DocumentDB Cluster - Amazon Keyspaces (for Apache Cassandra) - DEMO: Creating a Keyspace and Table in Amazon Keyspaces (for Apache Cassandra) - Course Summary

About the Author
Learning Paths

Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.