Course Introduction
Amazon RDS
RDS vs. EC2
Amazon RDS Costs
Amazon RDS Performance Insights
DynamoDB Basics
DynamoDB
DynamoDB Accelerator
ElastiCache
Neptune
Redshift
Amazon DocumentDB
Amazon Keyspaces
Which database service should I use?
Using Automation to Deploy AWS Databases
Data Lakes in AWS
The course is part of this learning path
This section of the AWS Certified Solutions Architect - Professional learning path introduces you to the AWS database services relevant to the SAP-C02 exam. We then understand the service options available and learn how to select and apply AWS database services to meet specific design scenarios relevant to the AWS Certified Solutions Architect - Professional exam.
Want more? Try a Lab Playground or do a Lab Challenge!
Learning Objectives
- Understand the various database services that can be used when building cloud solutions on AWS
- Learn how to build databases using Amazon RDS, DynamoDB, Redshift, DocumentDB, Keyspaces, and QLDB
- Learn how to create ElastiCache and Neptune clusters
- Understand which AWS database service to choose based on your requirements
- Discover how to use automation to deploy databases in AWS
- Learn about data lakes and how to build a data lake in AWS
In this lecture, we’ll continue exploring when to use nonrelational databases on AWS, specifically in-memory, graph, time-series, and ledger databases. Each of these are purpose-built, so their corresponding use cases are more straightforward.
Let’s take a look at in-memory databases first, as these are less specialized than the others. Often the question that leads someone to choose an in-memory database is:
“Do you need a caching layer?”
If the answer is yes, then you have your choice between Amazon ElastiCache and Amazon DynamoDB Accelerator (DAX).
If your main database layer is DynamoDB, then using DAX is the best option for a caching layer. However, DAX won’t be beneficial if your requests require strong consistency or transactions. In these cases, DAX will pass back the request to DynamoDB and the results won’t be cached. However, if your application can tolerate eventual consistency, then adding DAX will lower latency from milliseconds to microseconds.
DAX is only used with DynamoDB, so if you're using another database, such as RDS or DocumentDB, then you should use ElastiCache as your caching layer. ElastiCache comes in two flavors: memcached and Redis.
You use memcached when you need very simple caching with key-value lookups. This is used when you need an easy-to-use engine and your use case doesn’t require persistence, advanced data types like lists or sets, pub/sub, replication or transactions.
If your use case requires any of those things, then Redis supports all of them. Redis, in general, has a wide command set and broader set of use cases. The biggest factor is the level of persistence it provides in terms of replication and failover. And if you need transactions, then Redis is the only caching option that supports that.
Using ElastiCache with Redis is the best option if you have another database and ultimately want to increase the speed of your responses. However, Redis is a very popular and powerful database that is much more than a caching layer. You could ultimately decide to use Redis as your primary database with Amazon MemoryDB. This provides you a fully managed Redis cluster with a higher degree of durability than ElastiCache provides. It leverages a transaction log that spans across multiple AZs, which provides fast database recovery and restarts without the risk of data loss. The idea is you could potentially replace both your core database and your caching layer with MemoryDB.
Ultimately, Redis is very common in use cases like online gaming and leaderboards, machine learning, and analytics.
Now the question to look at for when to use a graph database is “Is my data highly connected and schema-free? And if so, do I need to query this data at fast speeds regardless of the size of the dataset?”
Amazon Neptune would be the best option here. For example, take a scenario where you’re trying to find connections between the different users on your site, as well as connections between the places they’ve visited, so you can recommend both the users and their friends new places to visit. If you used a relational database with this type of highly connected data, your queries would quickly get complex, taking more time to write and ultimately achieving slower performance as the dataset gets bigger. With a graph database, you can query data based on a starting point, making it easy to find connections from that starting point, traversing only part of the graph as the data set gets larger.
Neptune is best used when the relationship of the data is as important as the data itself, such as in use cases like social media networks, recommendation engines, knowledge graphs, and fraud detection.
Neptune supports three main querying languages: openCypher, Gremlin, and W3C SPARQL. It has a similar storage durability system as Amazon Aurora, and stores your data across six nodes in 3 separate Availability Zones. It also supports up to 15 read replicas.
Now, onto the next question, which is:
“Does your database need to be optimized to track and measure data over a period of time?”
If it does, then using Amazon TimeStream is your best option, as it’s entire purpose is to collect, store, and process time series data at any scale. This is often important in IoT use cases. For example, if you have devices in the field that need to register temperature data, and you need to collect that data every 10 minutes, then a time series database is a great option. Another example would be to track the price of a stock over a period of time. TimeStream provides a serverless database solution to store data like this. It’s optimized for this workload, as it separates the ingestion, storage, and query layers, enabling them all to scale independently of one another.
The ingestion layer uses the timestamp to write the data to the database at nanosecond granularity. The storage layer is made up of two main tiers: an in-memory layer for recent data, and magnetic storage as that data gets older. You can write rules to move the data to the magnetic storage tier as it reaches a certain age. And then the query layer uses SQL engine and can query across both storage tiers.
The next question is “What if I need an immutable, sequenced, cryptographically verifiable history of changes in my database?”
In this case, the best answer is Amazon Quantum Ledger Database or QLDB. This is when you need a database that can prove that every entry hasn’t been modified after the fact. It achieves this because it is immutable and append-only. This is seen in use cases such as system of records, healthcare records, and even some supply chain tracking and financial systems. It uses PartiQL that supports SQL-compatible access. By default, it’s deployed across multiple AZs with multiple copies per AZ. It additionally scales automatically to meet the needs of your data.
To summarize, in this lecture we covered six of the more specialized database options: DAX, ElastiCache, MemoryDB, Neptune, TimeStream, and Amazon QLDB. That’s it for this one - I’ll see you next time!
Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.