In our previous post, DynamoDB: An Inside Look Into NoSQL, we introduced you to NoSQL, spoke about the CAP theorem and certain assumptions that need to be made while designing NoSQL data stores. Let’s dive deeper!
Design Considerations
Traditional commercial systems and applications perform data replication in a synchronized manner. The advantage of this approach is that data is always consistent. But the downside is that the system itself might not be available (CAP theorem). To put it simply: the data is unavailable until it is absolutely certain that it is replicated across all nodes correctly.
Alas! The Web world lives in its own perceived reality. 🙂 Systems go down and the network fails regularly. Availability is the single largest factor which makes/breaks a company. It is thus imperative that we handle such scenarios. Netflix’s Chaos Monkey helps us architect our product to take into account these failures. In order to ensure availability at all costs, optimistic asynchronous replication strategies can be put in place. The drawback, however, is that it leads to conflicting changes to data which must be detected and resolved. The process of conflict resolution introduces 2 new problems: when to resolve them and who resolves them. DynamoDB introduces the novel concept of an eventually consistent data store; that is all updates reach all nodes eventually.
Deciding when to perform the conflict resolution is a primary design consideration. We can perform it during the READ
operation or WRITE
operation. Many legacy data stores chose to do conflict resolution during the WRITE
operation. In such systems, WRITE
s will be rejected if data is not replicated across all nodes. In large e-commerce companies such as Amazon, rejecting WRITE
s is not an option as it leads to revenue loss and poor customer experience. Hence, DynamoDB does the complex conflict resolution during READ
s.
Let’s take an example to understand it better. Consider a system with 3 nodes: NODE1
, NODE2
and NODE3
. In a traditional system, a WRITE
to NODE2
must be replicated to NODE1
and NODE3
and only then is the WRITE
operation considered successful. This synchronized replication takes time to complete during which time the system is NOT available. But systems using DynamoDB have the option to defer this update in exchange for higher availability. So a WRITE
to NODE2
is considered successful as long as NODE2
is able to honor that request. NODE2
eventually replicates it to NODE1
and NODE3
. DynamoDB usually takes a second (or a maximum of a couple of seconds) to achieve consistency across all nodes.
Note: In case your product, like ours, needs a strongly consistent read just set the value of the attribute ConsistentRead
to true
.
Another very important design consideration is who performs the conflict resolution. It can either be done by the data store (DynamoDB in our case) or the application. The data store usually employs simple policies and rules such as “last WRITE
wins”, which is pretty good in the majority of the cases. If the application wishes to have complex rules and implement its own conflict resolution mechanisms, then it is free to do so.
A couple of other design considerations are as follows:
- Incremental Scalability: The data store should be able to scale-out 1 node at a time, with minimal or no impact on the system itself.
- Symmetry: All nodes in the data store are peers, i.e. all nodes are equal and share the same set of responsibilities.
- Decentralization: With a central authority, the most common problem faced is “single point of failure”. Decentralization helps us mitigate this and keep the system simple, more scalable and more available.
- Heterogeneity: Different nodes in the data store might have different configurations. Some nodes might be optimized for storage and some might be plain commodity hardware. The data store should take into account this heterogeneous mix of nodes to distribute tasks proportional to its capabilities.
In the next blog post, we will look into System Architecture.

