Membership and Failure Detection in DynamoDB: An Inside Look Into NoSQL, Part 7

This is a guest post from 47Line Technologies.

In our previous post, How to handle failures in DynamoDB – An inside look into NoSQL, we discussed handling failures via Hinted Handoff & Replica Synchronization. We also talked about the advantages of using a Sloppy Quorum and Merkle Trees.

In this last & final part of the series, we will look into Membership and Failure Detection.

Ring Membership

In any production environment, node outages are often transient. But it rarely signifies permanent failure and hence there is no need for repair or rebalancing the partition. On the other hand, manual errors might result in the unintentional startup of new DynamoDB nodes. A proper mechanism is essential for the addition and removal of nodes from the DynamoDB Ring. An administrator uses a tool to issue a membership change command to either add/remove a node. The node that picks up this request writes into its persistent store the membership change request and the timestamp.

A gossip-based protocol transfers the membership changes and maintains a consistent view of membership across all nodes. Each node contacts a peer chosen at random every second and the two nodes efficiently reconcile their persisted membership change histories. Partitioning & placement information also propagates via the gossip-based protocol and each storage node is aware of the token ranges its peers are responsible for. This allows each node to forward a key’s read/write operations to the right set of nodes directly.

Ring Membership (Credit)

External Discovery

It’s best to explain with an example: An administrator joins node A to the ring. He then joins node B to the ring. Nodes A and B consider itself as part of the ring, yet neither would be immediately aware of each other. To prevent these logical partitions, DynamoDB introduced the concept of seed nodes. Seed nodes are fully functional nodes that are discovered via an external mechanism (static configuration or a configuration service) and are known to all nodes. Since each node communicates with the seed node and gossip-based protocol transfer the membership changes, logical partitions are highly unlikely.

Failure Detection

Failure detection in DynamoDB is used to avoid attempts to communicate with unreachable peers during get() and put() operations and when transferring partitions and hinted replicas. For the purpose of avoiding failed attempts at communication, a purely local notion of failure detection is entirely sufficient: node A may consider node B failed if node B does not respond to node A’s messages (even if B is responsive to node C‘s messages). In the presence of a steady rate of client requests generating inter-node communication in the DynamoDB ring, a node A quickly discovers that a node B is unresponsive when B fails to respond to a message; Node A then uses alternate nodes to service requests that map to B‘s partitions; A also periodically retries B to check for the latter’s recovery. In the absence of client requests to drive traffic between two nodes, neither node really needs to know whether the other is reachable and responsive.

Conclusion

This exhaustive 7-part series detailing every component is sufficient to understand the design and architecture of any NoSQL system. Phew! What an incredible journey it has been these couple of months delving into the internals of DynamoDB. Having patiently read this far, you are amongst the chosen few who have this sort of deep NoSQL knowledge. You can be extremely proud of yourself!

Let’s eagerly await another expedition soon!

Article authored by Vijay Olety.

Avatar

Written by

47Line Technologies

47Line is building solutions solving critical business problems using “cloud as the backbone”. The team has been working in Cloud Computing domain for last 6 years and have proven thought leadership in Cloud, Big Data technologies.


Related Posts

Patrick Navarro
Patrick Navarro
— January 22, 2020

Top 5 AWS Salary Report Findings

At the speed the cloud tech space is developing, it can be hard to keep track of everything that’s happening within the AWS ecosystem. Advances in technology prompt smarter functionality and innovative new products, which in turn give rise to new job roles that have a ripple effect on t...

Read more
  • AWS
  • salary
Alisha Reyes
Alisha Reyes
— January 6, 2020

New on Cloud Academy: Red Hat, Agile, OWASP Labs, Amazon SageMaker Lab, Linux Command Line Lab, SQL, Git Labs, Scrum Master, Azure Architects Lab, and Much More

Happy New Year! We hope you're ready to kick your training in overdrive in 2020 because we have a ton of new content for you. Not only do we have a bunch of new courses, hands-on labs, and lab challenges on AWS, Azure, and Google Cloud, but we also have three new courses on Red Hat, th...

Read more
  • agile
  • AWS
  • Azure
  • Google Cloud Platform
  • Linux
  • OWASP
  • programming
  • red hat
  • scrum
Alisha Reyes
Alisha Reyes
— December 24, 2019

Cloud Academy’s Blog Digest: Azure Best Practices, 6 Reasons You Should Get AWS Certified, Google Cloud Certification Prep, and more

Happy Holidays from Cloud Academy We hope you have a wonderful holiday season filled with family, friends, and plenty of food. Here at Cloud Academy, we are thankful for our amazing customer like you.  Since this time of year can be stressful, we’re sharing a few of our latest article...

Read more
  • AWS
  • azure best practices
  • blog digest
  • Cloud Academy
  • Google Cloud
Avatar
Guy Hummel
— December 12, 2019

Google Cloud Platform Certification: Preparation and Prerequisites

Google Cloud Platform (GCP) has evolved from being a niche player to a serious competitor to Amazon Web Services and Microsoft Azure. In 2019, research firm Gartner placed Google in the Leaders quadrant in its Magic Quadrant for Cloud Infrastructure as a Service for the second consecuti...

Read more
  • AWS
  • Azure
  • Google Cloud Platform
Alisha Reyes
Alisha Reyes
— December 10, 2019

New Lab Challenges: Push Your Skills to the Next Level

Build hands-on experience using real accounts on AWS, Azure, Google Cloud Platform, and more Meaningful cloud skills require more than book knowledge. Hands-on experience is required to translate knowledge into real-world results. We see this time and time again in studies about how pe...

Read more
  • AWS
  • Azure
  • Google Cloud
  • hands-on
  • labs
Alisha Reyes
Alisha Reyes
— December 5, 2019

New on Cloud Academy: AWS Solution Architect Lab Challenge, Azure Hands-on Labs, Foundation Certificate in Cyber Security, and Much More

Now that Thanksgiving is over and the craziness of Black Friday has died down, it's now time for the busiest season of the year. Whether you're a last-minute shopper or you already have your shopping done, the holidays bring so much more excitement than any other time of year. Since our...

Read more
  • AWS
  • AWS solution architect
  • AZ-203
  • Azure
  • cyber security
  • FCCS
  • Foundation Certificate in Cyber Security
  • Google Cloud Platform
  • Kubernetes
Avatar
Cloud Academy Team
— December 4, 2019

Understanding Enterprise Cloud Migration

What is enterprise cloud migration? Cloud migration is about moving your data, applications, and even infrastructure from your on-premises computers or infrastructure to a virtual pool of on-demand, shared resources that offer compute, storage, and network services at scale. Why d...

Read more
  • AWS
  • Azure
  • Data Migration
Wendy Dessler
Wendy Dessler
— November 27, 2019

6 Reasons Why You Should Get an AWS Certification This Year

In the past decade, the rise of cloud computing has been undeniable. Businesses of all sizes are moving their infrastructure and applications to the cloud. This is partly because the cloud allows businesses and their employees to access important information from just about anywhere. ...

Read more
  • AWS
  • Certifications
  • certified
Avatar
Andrea Colangelo
— November 26, 2019

AWS Regions and Availability Zones: The Simplest Explanation You Will Ever Find Around

The basics of AWS Regions and Availability Zones We’re going to treat this article as a sort of AWS 101 — it’ll be a quick primer on AWS Regions and Availability Zones that will be useful for understanding the basics of how AWS infrastructure is organized. We’ll define each section,...

Read more
  • AWS
Avatar
Dzenan Dzevlan
— November 20, 2019

Application Load Balancer vs. Classic Load Balancer

What is an Elastic Load Balancer? This post covers basics of what an Elastic Load Balancer is, and two of its examples: Application Load Balancers and Classic Load Balancers. For additional information — including a comparison that explains Network Load Balancers — check out our post o...

Read more
  • ALB
  • Application Load Balancer
  • AWS
  • Elastic Load Balancer
  • ELB
Albert Qian
Albert Qian
— November 13, 2019

Advantages and Disadvantages of Microservices Architecture

What are microservices? Let's start our discussion by setting a foundation of what microservices are. Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs). ...

Read more
  • AWS
  • Docker
  • Kubernetes
  • Microservices
Nisar Ahmad
Nisar Ahmad
— November 12, 2019

Kubernetes Services: AWS vs. Azure vs. Google Cloud

Kubernetes is a popular open-source container orchestration platform that allows us to deploy and manage multi-container applications at scale. Businesses are rapidly adopting this revolutionary technology to modernize their applications. Cloud service providers — such as Amazon Web Ser...

Read more
  • AWS
  • Azure
  • Google Cloud
  • Kubernetes