DynamoDB: An Inside Look Into NoSQL – Part 2

In our previous post, DynamoDB: An Inside Look Into NoSQL, we introduced you to NoSQL, spoke about the CAP theorem and certain assumptions that need to be made while designing NoSQL data stores. Let’s dive deeper!

Design Considerations

Traditional commercial systems and applications perform data replication in a synchronized manner. The advantage of this approach is that data is always consistent. But the downside is that the system itself might not be available (CAP theorem). To put it simply: the data is unavailable until it is absolutely certain that it is replicated across all nodes correctly.

Alas! The Web world lives in its own perceived reality. 🙂 Systems go down and the network fails regularly. Availability is the single largest factor which makes/breaks a company. It is thus imperative that we handle such scenarios. Netflix’s Chaos Monkey helps us architect our product to take into account these failures. In order to ensure availability at all costs, optimistic asynchronous replication strategies can be put in place. The drawback, however, is that it leads to conflicting changes to data which must be detected and resolved. The process of conflict resolution introduces 2 new problems: when to resolve them and who resolves them. DynamoDB introduces the novel concept of an eventually consistent data store; that is all updates reach all nodes eventually.

Deciding when to perform the conflict resolution is a primary design consideration. We can perform it during the READ operation or WRITE operation. Many legacy data stores chose to do conflict resolution during the WRITE operation. In such systems, WRITEs will be rejected if data is not replicated across all nodes. In large e-commerce companies such as Amazon, rejecting WRITEs is not an option as it leads to revenue loss and poor customer experience. Hence, DynamoDB does the complex conflict resolution during READs.
DynamoDB Logo
Let’s take an example to understand it better. Consider a system with 3 nodes: NODE1, NODE2 and NODE3. In a traditional system, a WRITE to NODE2 must be replicated to NODE1 and NODE3 and only then is the WRITE operation considered successful. This synchronized replication takes time to complete during which time the system is NOT available. But systems using DynamoDB have the option to defer this update in exchange for higher availability. So a WRITE to NODE2 is considered successful as long as NODE2 is able to honor that request. NODE2 eventually replicates it to NODE1 and NODE3. DynamoDB usually takes a second (or a maximum of a couple of seconds) to achieve consistency across all nodes.

Note: In case your product, like ours, needs a strongly consistent read just set the value of the attribute ConsistentRead to true.
Another very important design consideration is who performs the conflict resolution. It can either be done by the data store (DynamoDB in our case) or the application. The data store usually employs simple policies and rules such as “last WRITE wins”, which is pretty good in the majority of the cases. If the application wishes to have complex rules and implement its own conflict resolution mechanisms, then it is free to do so.

A couple of other design considerations are as follows:

  1. Incremental Scalability: The data store should be able to scale-out 1 node at a time, with minimal or no impact on the system itself.
  2. Symmetry: All nodes in the data store are peers, i.e. all nodes are equal and share the same set of responsibilities.
  3. Decentralization: With a central authority, the most common problem faced is “single point of failure”. Decentralization helps us mitigate this and keep the system simple, more scalable and more available.
  4. Heterogeneity: Different nodes in the data store might have different configurations. Some nodes might be optimized for storage and some might be plain commodity hardware. The data store should take into account this heterogeneous mix of nodes to distribute tasks proportional to its capabilities.

In the next blog post, we will look into System Architecture.

Avatar

Written by

47Line Technologies

47Line is building solutions solving critical business problems using “cloud as the backbone”. The team has been working in Cloud Computing domain for last 6 years and have proven thought leadership in Cloud, Big Data technologies.

Related Posts

Avatar
Chandan Patra
— June 11, 2019

Amazon DynamoDB: 10 Things You Should Know

Amazon DynamoDB is a managed NoSQL service with strong consistency and predictable performance that shields users from the complexities of manual setup. Whether or not you've actually used a NoSQL data store yourself, it's probably a good idea to make sure you fully understand the key ...

Read more
  • AWS
  • DynamoDB
Avatar
Cloud Academy Team
— September 5, 2017

Inside the Cloud – Episode 2: Amazon DynamoDB, Redshift, and RDS

It’s all about Amazon Web Services databases in our second episode of Inside the Cloud! In case you missed the announcement earlier this month, Inside the Cloud is our new video series that helps you stay on top of the latest news from Amazon Web Services, Microsoft Azure, Google Clo...

Read more
  • AWS
  • DynamoDB
  • RDS
  • RedShift
Avatar
Sudhi Seshachala
— November 18, 2016

Monitoring DynamoDB with CloudWatch

DynamoDB and Cloudwatch monitoring: Amazon Web Services recently introduced a feature to integrate its DynamoDB and CloudWatch components. This feature will allow you to collect and analyze performance metrics. In this post, we'll cover everything you need to know to get started using t...

Read more
  • AWS
  • DynamoDB
Avatar
Paul Carlstroem
— April 26, 2016

Working with Amazon DynamoDB: New Course

A fantastic new course from an exciting new instructor We proudly announce a new course Working with Amazon DynamoDB from a new instructor, Ryan Park. Ryan has the honor of acting as an AWS Community Hero. AWS describes their Community Heroes as: Mentors and super users. They are crea...

Read more
  • AWS
  • DynamoDB
Avatar
Chandan Patra
— October 9, 2015

The DynamoDB-Caused AWS Outage: What We Have Learned

Over the course of a few hours this past September 20, some of the Internet's most popular sites like Netflix, Airbnb, and IMDb - along with other AWS customers - suffered major latency and even some outages. The proximate cause? Amazon's Status dashboard told the story of this AWS outa...

Read more
  • AWS
  • DynamoDB
Avatar
Andrea Colangelo
— October 10, 2014

Amazon Introducing Some Interesting New DynamoDB Features

DynamoDB is a managed NoSQL service in the AWS family. Both the key-value and the document data model are available, and other DynamoDB features include the usual auto scalability and high availability of each AWS service, and also excellent integration with other AWS services like MapR...

Read more
  • AWS
  • DynamoDB
Avatar
47Line Technologies
— August 28, 2014

Membership and Failure Detection in DynamoDB: An Inside Look Into NoSQL, Part 7

This is a guest post from 47Line Technologies. In our previous post, How to handle failures in DynamoDB – An inside look into NoSQL, we discussed handling failures via Hinted Handoff & Replica Synchronization. We also talked about the advantages of using a Sloppy Quorum and Merkl...

Read more
  • AWS
  • DynamoDB
Avatar
47Line Technologies
— July 30, 2014

How to Handle Failures in DynamoDB – An Inside Look Into NoSQL, Part 6

This is a guest post from 47Line Technologies. In Data Versioning with DynamoDB – An inside look into NoSQL, Part 5, we spoke about data versioning, the 2 reconciliation strategies and how vector clocks are used for the same. In this article, we will talk about Handling Failures. Ha...

Read more
  • DynamoDB
Avatar
47Line Technologies
— July 15, 2014

Data Versioning With DynamoDB – NoSQL, Part 5

In DynamoDB: Replication and Partitioning – Part 4, we talked about partitioning and replication in detail. We introduced consistent hashing, virtual nodes and the concept of coordinator nodes and preference list. In this article, we will discuss Data Versioning with DynamoDB. Data Ver...

Read more
  • DynamoDB
Avatar
47Line Technologies
— July 1, 2014

DynamoDB: Replication and Partitioning – Part 4

In our previous post on DynamoDB, An Inside Look Into NoSQL, we mentioned the various distributed techniques used while architecting NoSQL data stores. A table nicely summarized these techniques and their advantages. In this article, we will go into the details of partitioning and repli...

Read more
  • AWS
  • DynamoDB
Avatar
47Line Technologies
— June 16, 2014

DynamoDB: An Inside Look Into NoSQL – Part 3

This is a guest post from 47Line Technologies. In our previous post 'DynamoDB: An Inside Look Into NoSQL', we looked at Design Considerations of NoSQL and introduced the concept of eventual consistency. In this article, we will introduce the concepts and techniques used while archite...

Read more
  • AWS
  • DynamoDB
Avatar
47Line Technologies
— May 28, 2014

DynamoDB: An Inside Look Into NoSQL – Part 1

This is a guest post from 47Line Technologies. In our earlier posts (Big Data: Getting Started with Hadoop, Sqoop & Hive and Hadoop Learning Series – Hive, Flume, HDFS, and Retail Analysis), we introduced the Hadoop ecosystem & explained its various components using a real-wo...

Read more
  • AWS
  • DynamoDB