Creating DynamoDB Tables and Indices
High Availability in Amazon Aurora
Amazon MemoryDB for Redis
Which Database Service Should I Use
The course is part of this learning path
This course provides detail on the AWS Database services relevant to the AWS Certified Developer - Associate exam. This includes Amazon RDS, Aurora, DynamoDB, MemoryDB for Redis, and ElastiCache.
Want more? Try a lab playground or do a Lab Challenge!
- Obtain a solid understanding of the following Amazon database services: Amazon RDS, Aurora, DynamoDB, MemoryDB for Redis, and ElastiCache
- Create an Amazon RDS database
- Create a DynamoDB database
- Create an ElastiCache cluster
In this lecture, we’ll continue exploring when to use nonrelational databases on AWS, specifically in-memory, graph, time-series, and ledger databases. Each of these are purpose-built, so their corresponding use cases are more straightforward.
Let’s take a look at in-memory databases first, as these are less specialized than the others. Often the question that leads someone to choose an in-memory database is:
“Do you need a caching layer?”
If the answer is yes, then you have your choice between Amazon ElastiCache and Amazon DynamoDB Accelerator (DAX).
If your main database layer is DynamoDB, then using DAX is the best option for a caching layer. However, DAX won’t be beneficial if your requests require strong consistency or transactions. In these cases, DAX will pass back the request to DynamoDB and the results won’t be cached. However, if your application can tolerate eventual consistency, then adding DAX will lower latency from milliseconds to microseconds.
DAX is only used with DynamoDB, so if you're using another database, such as RDS or DocumentDB, then you should use ElastiCache as your caching layer. ElastiCache comes in two flavors: memcached and Redis.
You use memcached when you need very simple caching with key-value lookups. This is used when you need an easy-to-use engine and your use case doesn’t require persistence, advanced data types like lists or sets, pub/sub, replication or transactions.
If your use case requires any of those things, then Redis supports all of them. Redis, in general, has a wide command set and broader set of use cases. The biggest factor is the level of persistence it provides in terms of replication and failover. And if you need transactions, then Redis is the only caching option that supports that.
Using ElastiCache with Redis is the best option if you have another database and ultimately want to increase the speed of your responses. However, Redis is a very popular and powerful database that is much more than a caching layer. You could ultimately decide to use Redis as your primary database with Amazon MemoryDB. This provides you a fully managed Redis cluster with a higher degree of durability than ElastiCache provides. It leverages a transaction log that spans across multiple AZs, which provides fast database recovery and restarts without the risk of data loss. The idea is you could potentially replace both your core database and your caching layer with MemoryDB.
Ultimately, Redis is very common in use cases like online gaming and leaderboards, machine learning, and analytics.
Now the question to look at for when to use a graph database is “Is my data highly connected and schema-free? And if so, do I need to query this data at fast speeds regardless of the size of the dataset?”
Amazon Neptune would be the best option here. For example, take a scenario where you’re trying to find connections between the different users on your site, as well as connections between the places they’ve visited, so you can recommend both the users and their friends new places to visit. If you used a relational database with this type of highly connected data, your queries would quickly get complex, taking more time to write and ultimately achieving slower performance as the dataset gets bigger. With a graph database, you can query data based on a starting point, making it easy to find connections from that starting point, traversing only part of the graph as the data set gets larger.
Neptune is best used when the relationship of the data is as important as the data itself, such as in use cases like social media networks, recommendation engines, knowledge graphs, and fraud detection.
Neptune supports three main querying languages: openCypher, Gremlin, and W3C SPARQL. It has a similar storage durability system as Amazon Aurora, and stores your data across six nodes in 3 separate Availability Zones. It also supports up to 15 read replicas.
Now, onto the next question, which is:
“Does your database need to be optimized to track and measure data over a period of time?”
If it does, then using Amazon TimeStream is your best option, as it’s entire purpose is to collect, store, and process time series data at any scale. This is often important in IoT use cases. For example, if you have devices in the field that need to register temperature data, and you need to collect that data every 10 minutes, then a time series database is a great option. Another example would be to track the price of a stock over a period of time. TimeStream provides a serverless database solution to store data like this. It’s optimized for this workload, as it separates the ingestion, storage, and query layers, enabling them all to scale independently of one another.
The ingestion layer uses the timestamp to write the data to the database at nanosecond granularity. The storage layer is made up of two main tiers: an in-memory layer for recent data, and magnetic storage as that data gets older. You can write rules to move the data to the magnetic storage tier as it reaches a certain age. And then the query layer uses SQL engine and can query across both storage tiers.
The next question is “What if I need an immutable, sequenced, cryptographically verifiable history of changes in my database?”
In this case, the best answer is Amazon Quantum Ledger Database or QLDB. This is when you need a database that can prove that every entry hasn’t been modified after the fact. It achieves this because it is immutable and append-only. This is seen in use cases such as system of records, healthcare records, and even some supply chain tracking and financial systems. It uses PartiQL that supports SQL-compatible access. By default, it’s deployed across multiple AZs with multiple copies per AZ. It additionally scales automatically to meet the needs of your data.
To summarize, in this lecture we covered six of the more specialized database options: DAX, ElastiCache, MemoryDB, Neptune, TimeStream, and Amazon QLDB. That’s it for this one - I’ll see you next time!
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.