1. Home
  2. Training Library
  3. Designing Database solutions in AWS - Level 2

Choosing a non-relational database on AWS - Part 2

Contents

keyboard_tab
Amazon RDS
4
5
Scaling with RDS
PREVIEW4m 31s
Amazon Redshift
Amazon DynamoDB
Amazon ElastiCache

The course is part of this learning path

In this lecture, we’ll continue exploring when to use nonrelational databases on AWS, specifically in-memory, graph, time-series, and ledger databases. Each of these are purpose-built, so their corresponding use cases are more straightforward. 

Let’s take a look at in-memory databases first, as these are less specialized than the others. Often the question that leads someone to choose an in-memory database is: 

“Do you need a caching layer?” 

If the answer is yes, then you have your choice between Amazon ElastiCache and Amazon DynamoDB Accelerator (DAX). 

If your main database layer is DynamoDB, then using DAX is the best option for a caching layer. However, DAX won’t be beneficial if your requests require strong consistency or transactions. In these cases, DAX will pass back the request to DynamoDB and the results won’t be cached. However, if your application can tolerate eventual consistency, then adding DAX will lower latency from milliseconds to microseconds. 

DAX is only used with DynamoDB, so if you're using another database, such as RDS or DocumentDB, then you should use ElastiCache as your caching layer. ElastiCache comes in two flavors: memcached and Redis. 

You use memcached when you need very simple caching with key-value lookups. This is used when you need an easy to use engine and your use case doesn’t require persistence, advanced data types like lists or sets, pub/sub, replication or transactions. 

If your use case requires any of those things, then Redis supports all of them. Redis, in general, has a wide command set and broader set of use cases. The biggest factor is the level of persistence it provides in terms of replication and failover. And if you need transactions, then Redis is the only caching option that supports that. 

Using ElastiCache with Redis is the best option if you have another database and ultimately want to increase the speed of your responses. However, Redis is a very popular and powerful database that is much more than a caching layer. You could ultimately decide to use Redis as your primary database with Amazon MemoryDB. This provides you a fully managed Redis cluster with a higher degree of durability than ElastiCache provides. It leverages a transaction log that spans across multiple AZs, which provides fast database recovery and restarts without the risk of data loss. The idea is you could potentially replace both your core database and your caching layer with MemoryDB. 

Ultimately, Redis is very common in use cases like online gaming and leaderboards, machine learning, and analytics.  

Now the question to look at for when to use a graph database is “Is my data highly connected and schema-free? And if so, do I need to query this data at fast speeds regardless of the size of the dataset?” 

Amazon Neptune would be the best option here. For example, take a scenario where you’re trying to find connections between the different users on your site, as well as connections between the places they’ve visited, so you can recommend both the users and their friends new places to visit. If you used a relational database with this type of highly connected data, your queries would quickly get complex, taking more time to write and ultimately achieving slower performance as the dataset gets bigger.  With a graph database, you can query data based on a starting point, making it easy to find connections from that starting point, traversing only part of the graph as the data set gets larger. 

Neptune is best used when the relationship of the data is as important as the data itself, such as in use cases like social media networks, recommendation engines, knowledge graphs, and fraud detection.  

Neptune supports three main querying languages: openCypher, Gremlin, and W3C SPARQL. It has a similar storage durability system as Amazon Aurora, and stores your data across six nodes in 3 separate Availability Zones. It also supports up to 15 read replicas. 

Now, onto the next question, which is:

 “Does your database need to be optimized to track and measure data over a period of time?”

If it does, then using Amazon TimeStream is your best option, as it’s entire purpose is to collect, store, and process time series data at any scale. This is often important in IoT use cases. For example, if you have devices in the field that need to register temperature data, and you need to collect that data every 10 minutes, then a time series database is a great option. Another example would be to track the price of a stock over a period of time. TimeStream provides a serverless database solution to store data like this. It’s optimized for this workload, as it separates the ingestion, storage, and query layers, enabling them all to scale independently of one another. 

The ingestion layer uses the timestamp to write the data to the database at nanosecond granularity. The storage layer is made up of two main tiers: an in-memory layer for recent data, and magnetic storage as that data gets older. You can write rules to move the data to the magnetic storage tier as it reaches a certain age. And then the query layer uses SQL engine and can query across both storage tiers. 

The next question is “What if I need an immutable, sequenced, cryptographically verifiable history of changes in my database?” 

In this case, the best answer is Amazon Quantum Ledger Database or QLDB. This is when you need a database that can prove that every entry hasn’t been modified after the fact. It achieves this because it is immutable and append-only. This is seen in use cases such as system of records, healthcare records, and even some supply chain tracking and financial systems. It uses PartiQL that supports SQL-compatible access. By default, it’s deployed across multiple AZs with multiple copies per AZ. It additionally scales automatically to meet the needs of your data. 

To summarize, in this lecture we covered six of the more specialized database options: DAX, ElastiCache, MemoryDB, Neptune, TimeStream, and Amazon QLDB. That’s it for this one - I’ll see you next time! 

Difficulty
Intermediate
Duration
32m
Students
4
Description

This course covers the core learning objective to meet the requirements of the 'Designing Database solutions in AWS - Level 2' skill

Learning Objectives:

  • Evaluate an appropriate AWS database based on specific design requirements
  • Analyze when caching is required to improve the performance of an AWS database
  • Evaluate an appropriate AWS database scaling strategyt to meet both expected and unexpected traffic demands
About the Author

Alana Layton is an experienced technical trainer, technical content developer, and cloud engineer living out of Seattle, Washington. Her career has included teaching about AWS all over the world, creating AWS content that is fun, and working in consulting. She currently holds six AWS certifications. Outside of Cloud Academy, you can find her testing her knowledge in bar trivia, reading, or training for a marathon.