This is a guest post from 47Line Technologies.
In our earlier posts (here and here), we introduced the Hadoop ecosystem & explained its various components using a real world example of the retail industry. We now possess a fair idea of the advantages of Big Data. NoSQL datastores are being used extensively in real-time & Big Data applications, which is why a look into its internals would help us make better design decisions in our applications.
NoSQL datastores provide a mechanism for retrieval and storage of data items that is modeled in a non-tabular manner. Simplicity of design, horizontal scalability and control over availability form the motivations for this approach. NoSQL is governed by the CAP theorem in the same way RDBMS is governed by ACID properties.
From the AWS stable, DynamoDB is the perfect choice if you are looking for a NoSQL solution. DynamoDB is a “fast, fully managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic. Its guaranteed throughput and single-digit millisecond latency make it a great fit for gaming, ad tech, mobile and many other applications.” Since it is a fully managed service, you need not worry about provisioning & managing of the underlying infrastructure. All the heavy-lifting is taken care for you.
Majority of the documentation available on the Net are how-to-get-started guides with examples of DynamoDB API usage. Let’s look at the thought process and design strategies that went into the making of DynamoDB.
“DynamoDB uses a synthesis of well known techniques to achieve scalability and availability: Data is partitioned and replicated using consistent hashing, and consistency is facilitated by object versioning. The consistency among replicas during updates is maintained by a quorum-like technique and a decentralized replica synchronization protocol. DynamoDB employs a gossip based distributed failure detection and membership protocol. Dynamo is a completely decentralized system with minimal need for manual administration. Storage nodes can be added and removed from DynamoDB without requiring any manual partitioning or redistribution.” You must be wondering – “Too much lingo for one paragraph”. Fret not, why fear when I am here 🙂 Let’s take one step at a time, shall we!
Requirements and Assumptions
This class of NoSQL storage system has the following requirements –
- Query Model: A “key” uniquely identifies a data item. Read and write operations are performed on this data item. It must be noted that no operation spans across multiple data items. There is no need for relational schema and DynamoDB works best when a single data item is less than 1MB.
- ACID Properties: As mentioned earlier, there is no need for relational schema and hence ACID (Atomicity, Consistency, Isolation, Durability) properties are not required. The industry and the academia acknowledge that ACID guarantees lead to poor availability. Dynamo targets applications that operate with weaker consistency if it results in high availability.
- Efficiency: DynamoDB needs to run on commodity hardware infrastructure. Stringent SLA (Service Level Agreement) ensure that latency and throughput requirements are met for the 99.9% percentile of the distribution. But everything has a catch – the tradeoffs consist of performance, cost, availability and durability guarantees.
In subsequent articles, we will look into Design Considerations & System Architecture.
Article authored by Vijay Olety
Microservices Architecture: Advantages and Drawbacks
Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs).Microservices have become increasingly popular over the past few years. The modular architectural style,...
What Are Best Practices for Tagging AWS Resources?
There are many use cases for tags, but what are the best practices for tagging AWS resources? In order for your organization to effectively manage resources (and your monthly AWS bill), you need to implement and adopt a thoughtful tagging strategy that makes sense for your business. The...
How to Optimize Amazon S3 Performance
Amazon S3 is the most common storage options for many organizations, being object storage it is used for a wide variety of data types, from the smallest objects to huge datasets. All in all, Amazon S3 is a great service to store a wide scope of data types in a highly available and resil...
How to Optimize Cloud Costs with Spot Instances: New on Cloud Academy
One of the main promises of cloud computing is access to nearly endless capacity. However, it doesn’t come cheap. With the introduction of Spot Instances for Amazon Web Services’ Elastic Compute Cloud (AWS EC2) in 2009, spot instances have been a way for major cloud providers to sell sp...
What are the Benefits of Machine Learning in the Cloud?
A Comparison of Machine Learning Services on AWS, Azure, and Google CloudArtificial intelligence and machine learning are steadily making their way into enterprise applications in areas such as customer support, fraud detection, and business intelligence. There is every reason to beli...
How to Use AWS CLI
The AWS Command Line Interface (CLI) is for managing your AWS services from a terminal session on your own client, allowing you to control and configure multiple AWS services.So you’ve been using AWS for awhile and finally feel comfortable clicking your way through all the services....
AWS Summit Chicago: New AWS Features Announced
Thousands of cloud practitioners descended on Chicago’s McCormick Place West last week to hear the latest updates around Amazon Web Services (AWS). While a typical hot and humid summer made its presence known outside, attendees inside basked in the comfort of air conditioning to hone th...
From Monolith to Serverless – The Evolving Cloudscape of Compute
Containers can help fragment monoliths into logical, easier to use workloads. The AWS Summit New York was held on July 17 and Cloud Academy sponsored my trip to the event. As someone who covers enterprise cloud technologies and services, the recent Amazon Web Services event was an insig...
AWS Certification Practice Exam: What to Expect from Test Questions
If you’re building applications on the AWS cloud or looking to get started in cloud computing, certification is a way to build deep knowledge in key services unique to the AWS platform. AWS currently offers nine certifications that cover the major cloud roles including Solutions Archite...
Disadvantages of Cloud Computing
If you want to deliver digital services of any kind, you’ll need to compute resources including CPU, memory, storage, and network connectivity. Which resources you choose for your delivery, cloud-based or local, is up to you. But you’ll definitely want to do your homework first.Cloud ...
Choosing the Right AWS Certification for You and Your Team
As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in the cloud.As the market leader and most mature pro...
How to Encrypt an EBS Volume
Keeping data and applications safe in the cloud is one the most visible challenges facing cloud teams in 2018. Cloud storage services where data resides are frequently a target for hackers, not because the services are inherently weak, but because they are often improperly configured....