image
Understanding Queries and Scans
Start course
Difficulty
Intermediate
Duration
1h 32m
Students
20878
Ratings
4.6/5
starstarstarstarstar-half
Description

Please note this course is outdated and has been replaced with the following courses:

 

This course provides an introduction to working with Amazon DynamoDB, a fully-managed NoSQL database service provided by Amazon Web Services. We begin with a description of DynamoDB and compare it to other database platforms. The course continues by walking you through designing tables, and reading and writing data, which is somewhat different than other databases you may be familiar with. We conclude with more advanced topics including secondary indexes and how DynamoDB handles very large tables.

Course Objectives

You will gain the following skills by completing this course:

  • How to create DynamoDB tables.
  • How to read and write data.
  • How to use queries and scans.
  • How to create and query secondary indexes.
  • How to work with large tables. 

Intended Audience

You should take this course if you have:

  • An understanding of basic AWS technical fundamentals.
  • Awareness of basic database concepts, such as tables, rows, indexes, and queries.
  • A basic understanding of computer programming. The course includes some programming examples in Python.

Prerequisites 

See the Intended Audience section.

This Course Includes

  • Expert-guided lectures about Amazon DynamoDB.
  • 1 hour and 31 minutes of high-definition video. 
  • Expert-level instruction from an industry veteran. 

What You'll Learn

Video Lecture What You'll Learn
DynamoDB Basics A basic and foundational overview of DynamoDB.
Creating DynamoDB Tables How to create DynamoDB tables and understand key concepts.
Reading and Writing Data How to use the AWS Console and API to read and write data.
Queries and Scans How to use queries and scans with the AWS Console and API.
Secondary Indexes How to work with Secondary Indexes.
Working with Large Tables How to use partitioning in large tables.

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.

Transcript

In the last video, we showed how to do operations on single items in a DynamoDB table. Now, we're going to talk about how to search the database using two operations called queries and scans.

A DynamoDB query searches the table and loads the results that match a single partition key. On the table that has only a partition key without also having a sort key, this means that a query will only return at most one item. So a query on our Orders table would never return more than one item. On a table with a compound key, like our Ordered Line Items table, then you can write a query to load all the records matching the partition key portion of that. You can also write a query that will limit its results to a single sort key value, or to a range of sort keys. For example, in our Line Items table we could query for the single item with line number 25, or line numbers greater than 9 or less than 36. Sort keys don't have to be numbers, so you can also do this with strings. You could search for sort keys between a certain range of strings, or even strings that begin with a specific value.

When you write a query, you have the option to filter your results on any other column in the table. These filters operate after the data has been narrowed down to a single partition key. This can reduce the amount of data that's sent to your application, but these filters can still be costly in terms of read capacity units. Let's say you query for a partition key that has 10 items with different sort keys. If you use a filter to narrow the result down to a single item, DynamoDB still has to read all 10 items in order to decide which ones match the filter. That means that it's going to take longer than just reading a single item, and it's going to use 10 units of read capacity. If this is something that you do often, you might want to add a secondary index instead. Secondary indexes will be covered later in this series.

If you run a query that returns a few results from the same partition key, then you also have the option of returning the results in ascending or descending order by sort key. You can also limit your results, for example, just returning the top 50 or the bottom 10.

Finally, when you execute a query, you have the choice of whether to make it strongly consistent or eventually consistent. If there's data in the result set that has been modified recently, a strongly consistent query will check all three replicas in the different availability zones and will make sure that it sends back the most recent version of the data. An eventually consistent query won't do that. It will just grab the data from a single zone, which might be slightly out of date compared to the last written version. Eventually consistent queries can be a little faster, and they're cheaper. They use half of the read capacity that strongly consistent queries do.

By contrast, a DynamoDB scan searches the entire table across all the partition keys. A scan is useful when you want to review all of the data in a table. The results of a scan can be filtered by attributes of the data, similar to how filters on queries work. But DynamoDB still has to scan your entire table in order to generate the results. This means that a filtered scan is going to be pretty expensive, because it's going to need enough read capacity to read the entire table. It's also probably going to be pretty slow.

The results of a table scan aren't guaranteed to be returned in a specific order. In practice, partition keys appear to be ordered randomly. Table scans can only operate with eventually consistent behavior. DynamoDB doesn't guarantee that a scan will retrieve the most recent copy of any record in the table. In practice, this probably won't affect you much, and the good news is that it makes scans considerably cheaper than they would be with strongly consistent semantics.

Finally, scans can be run in parallel from multiple threads in the same process, or even on multiple servers. It's easy to write code that summarizes an entire table in parallel running on an entire cluster of machines, similar to what you would do with Amazon Elastic MapReduce. In fact, if you use Elastic MapReduce to summarize data from a DynamoDB table, it will do this kind of parallel scan when it reads the data from DynamoDB.

About the Author

Ryan is the Storage Operations Manager at Slack, a messaging app for teams. He leads the technical operations for Slack's database and search technologies, which use Amazon Web Services for global reach.

Prior to Slack, Ryan led technical operations at Pinterest, one of the fastest-growing social networks in recent memory, and at Runscope, a debugging and testing service for APIs.

Ryan has spoken about patterns for modern application design at conferences including Amazon Web Services re:Invent and O'Reilly Fluent. He has also been a mentor for companies participating in the 500 Startups incubator.