Data Versioning With DynamoDB – NoSQL, Part 5

In DynamoDB: Replication and Partitioning – Part 4, we talked about partitioning and replication in detail. We introduced consistent hashing, virtual nodes and the concept of coordinator nodes and preference list. In this article, we will discuss Data Versioning with DynamoDB.

Data Versioning with DynamoDB

Eventual consistency, introduced by DynamoDB, allows for the updates to be pushed to all storage nodes asynchronously. A put operation returns before the update is pushed to all replicas, which results in scenarios where a subsequent get operation may return a value that does not reflect the latest changes. Depending on network partitions and server outages, not all replicas might have the latest updates even after an extended period of time.

But there are certain requirements that do not need the latest updates and can still tolerate certain inconsistencies. One such requirement is “Add To Cart” where put operation should always succeed and a get operation returning an old object is still tolerable. If a user makes a change to an earlier version of the shopping cart object, that change is still meaningful and should be preserved at all costs. However, the currently unavailable latest state of the shopping cart can have its own version of updates which should also be preserved. It is evident that data versioning has to be implemented to handle such scenarios.

In order to achieve such guarantees, DynamoDB treats the result of every modification as a new and immutable version of data. This allows for multiple versions of an object to be present in the system at the same time. If the newer version subsumes the earlier one, then the system can automatically determine the authoritative version (syntactic reconciliation). However, in some cases, the versions conflict and the client has to manually perform the reconciliation (semantic reconciliation).

Data Versioning DynamoDBData Versioning (Credit)

DynamoDB uses vector clocks in order to capture causality between multiple versions of an object. A vector clock is a list of (node, counter) pairs. One vector clock is associated with one version of every object. By analyzing the vector clocks, you can find out if the versions have a causal ordering or are on parallel branches. When a client wishes to perform an update, it must specify which version it is updating. This version can be got from an earlier get operation.
DynamoDB Version evolution of an object

Version evolution of an object (Credit)

Let’s understand how vector clocks work: A client writes a new object. Node Sx handles this write and creates a vector clock [(Sx, 1)] for the object D1. If the client now updates the object and node Sx again handles the request, we get a new object D2 and its vector clock [(Sx, 2)]. The client updates the object again and this time node Sy handles the request leading to object D3 with the vector clock [(Sx, 2), (Sy, 1)]. When a different client tries to update the object after reading D2, the new vector clock entry for object D4 is [(Sx, 2), (Sz, 1)] where Sz is the node that handled the request. Now when a new write request is issued by the client, it sees that there are already D3 and D4 objects. If node Sx is handling the request, it performs the reconciliation process and the new data object D5 is created with the vector clock [(Sx, 3), (Sy, 1), (Sz, 1)].

One possible issue with vector clocks is that its size may grow rapidly if multiple servers coordinate the writes to a particular object. However, this issue is unlikely since in production only the nodes from the preference list handle the operation. It is still desirable to limit the size of the vector clock. DynamoDB implements the following clock truncation scheme: A timestamp, which indicates the last time that node updated an item, is stored along with (node, counter) pair. When the number of (mode, counter) pairs reaches a threshold (say 15), the oldest pair is removed from the clock.

In the next article, we will look into how to Handle Failures.

Avatar

Written by

47Line Technologies

47Line is building solutions solving critical business problems using “cloud as the backbone”. The team has been working in Cloud Computing domain for last 6 years and have proven thought leadership in Cloud, Big Data technologies.


Related Posts

Avatar
Chandan Patra
— June 11, 2019

Amazon DynamoDB: 10 Things You Should Know

Amazon DynamoDB is a managed NoSQL service with strong consistency and predictable performance that shields users from the complexities of manual setup. Whether or not you've actually used a NoSQL data store yourself, it's probably a good idea to make sure you fully understand the key ...

Read more
  • AWS
  • DynamoDB
Avatar
Cloud Academy Team
— September 5, 2017

Inside the Cloud – Episode 2: Amazon DynamoDB, Redshift, and RDS

It’s all about Amazon Web Services databases in our second episode of Inside the Cloud! In case you missed the announcement earlier this month, Inside the Cloud is our new video series that helps you stay on top of the latest news from Amazon Web Services, Microsoft Azure, Google Clo...

Read more
  • AWS
  • DynamoDB
  • RDS
  • RedShift
Avatar
Sudhi Seshachala
— November 18, 2016

Monitoring DynamoDB with CloudWatch

DynamoDB and Cloudwatch monitoring: Amazon Web Services recently introduced a feature to integrate its DynamoDB and CloudWatch components. This feature will allow you to collect and analyze performance metrics. In this post, we'll cover everything you need to know to get started using t...

Read more
  • AWS
  • DynamoDB
Avatar
Paul Carlstroem
— April 26, 2016

Working with Amazon DynamoDB: New Course

A fantastic new course from an exciting new instructor We proudly announce a new course Working with Amazon DynamoDB from a new instructor, Ryan Park. Ryan has the honor of acting as an AWS Community Hero. AWS describes their Community Heroes as: Mentors and super users. They are crea...

Read more
  • AWS
  • DynamoDB
Avatar
Chandan Patra
— October 9, 2015

The DynamoDB-Caused AWS Outage: What We Have Learned

Over the course of a few hours this past September 20, some of the Internet's most popular sites like Netflix, Airbnb, and IMDb - along with other AWS customers - suffered major latency and even some outages. The proximate cause? Amazon's Status dashboard told the story of this AWS outa...

Read more
  • AWS
  • DynamoDB
Avatar
Andrea Colangelo
— October 10, 2014

Amazon Introducing Some Interesting New DynamoDB Features

DynamoDB is a managed NoSQL service in the AWS family. Both the key-value and the document data model are available, and other DynamoDB features include the usual auto scalability and high availability of each AWS service, and also excellent integration with other AWS services like MapR...

Read more
  • AWS
  • DynamoDB
Avatar
47Line Technologies
— August 28, 2014

Membership and Failure Detection in DynamoDB: An Inside Look Into NoSQL, Part 7

This is a guest post from 47Line Technologies. In our previous post, How to handle failures in DynamoDB – An inside look into NoSQL, we discussed handling failures via Hinted Handoff & Replica Synchronization. We also talked about the advantages of using a Sloppy Quorum and Merkl...

Read more
  • AWS
  • DynamoDB
Avatar
47Line Technologies
— July 30, 2014

How to Handle Failures in DynamoDB – An Inside Look Into NoSQL, Part 6

This is a guest post from 47Line Technologies. In Data Versioning with DynamoDB – An inside look into NoSQL, Part 5, we spoke about data versioning, the 2 reconciliation strategies and how vector clocks are used for the same. In this article, we will talk about Handling Failures. Ha...

Read more
  • DynamoDB
Avatar
47Line Technologies
— July 1, 2014

DynamoDB: Replication and Partitioning – Part 4

In our previous post on DynamoDB, An Inside Look Into NoSQL, we mentioned the various distributed techniques used while architecting NoSQL data stores. A table nicely summarized these techniques and their advantages. In this article, we will go into the details of partitioning and repli...

Read more
  • AWS
  • DynamoDB
Avatar
47Line Technologies
— June 16, 2014

DynamoDB: An Inside Look Into NoSQL – Part 3

This is a guest post from 47Line Technologies. In our previous post 'DynamoDB: An Inside Look Into NoSQL', we looked at Design Considerations of NoSQL and introduced the concept of eventual consistency. In this article, we will introduce the concepts and techniques used while archite...

Read more
  • AWS
  • DynamoDB
Avatar
47Line Technologies
— June 5, 2014

DynamoDB: An Inside Look Into NoSQL – Part 2

In our previous post, DynamoDB: An Inside Look Into NoSQL, we introduced you to NoSQL, spoke about the CAP theorem and certain assumptions that need to be made while designing NoSQL data stores. Let's dive deeper! Design Considerations Traditional commercial systems and applications p...

Read more
  • DynamoDB
Avatar
47Line Technologies
— May 28, 2014

DynamoDB: An Inside Look Into NoSQL – Part 1

This is a guest post from 47Line Technologies. In our earlier posts (Big Data: Getting Started with Hadoop, Sqoop & Hive and Hadoop Learning Series – Hive, Flume, HDFS, and Retail Analysis), we introduced the Hadoop ecosystem & explained its various components using a real-wo...

Read more
  • AWS
  • DynamoDB