image
Queries and Scans in the AWS Console
Start course
Difficulty
Intermediate
Duration
1h 32m
Students
20878
Ratings
4.6/5
starstarstarstarstar-half
Description

Please note this course is outdated and has been replaced with the following courses:

 

This course provides an introduction to working with Amazon DynamoDB, a fully-managed NoSQL database service provided by Amazon Web Services. We begin with a description of DynamoDB and compare it to other database platforms. The course continues by walking you through designing tables, and reading and writing data, which is somewhat different than other databases you may be familiar with. We conclude with more advanced topics including secondary indexes and how DynamoDB handles very large tables.

Course Objectives

You will gain the following skills by completing this course:

  • How to create DynamoDB tables.
  • How to read and write data.
  • How to use queries and scans.
  • How to create and query secondary indexes.
  • How to work with large tables. 

Intended Audience

You should take this course if you have:

  • An understanding of basic AWS technical fundamentals.
  • Awareness of basic database concepts, such as tables, rows, indexes, and queries.
  • A basic understanding of computer programming. The course includes some programming examples in Python.

Prerequisites 

See the Intended Audience section.

This Course Includes

  • Expert-guided lectures about Amazon DynamoDB.
  • 1 hour and 31 minutes of high-definition video. 
  • Expert-level instruction from an industry veteran. 

What You'll Learn

Video Lecture What You'll Learn
DynamoDB Basics A basic and foundational overview of DynamoDB.
Creating DynamoDB Tables How to create DynamoDB tables and understand key concepts.
Reading and Writing Data How to use the AWS Console and API to read and write data.
Queries and Scans How to use queries and scans with the AWS Console and API.
Secondary Indexes How to work with Secondary Indexes.
Working with Large Tables How to use partitioning in large tables.

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.

Transcript

Let's try running some queries and scans in the AWS console.

We're here at the orders table looking at the items tab. We ignored it earlier, but at the top of this page, we can see that this view is doing a scan of the orders table. It must be a scan because there's data visible from many different partition keys. A scan is the only way to do that. A query has to be limited to a single partition key.

We can see the first 100 items on the screen and we can click the little arrow to scroll to the next 100 items. Right now our table only has 120 items, so we can tell when we have scrolled to the end. The data's not sorted in any obvious order, but DynamoDB knows when we've hit the end. The drop-downs at the top are what indicate that this is doing a scan of the entire table. To the right of the word scan, there's a box that lets us select which index we are scanning. Our table doesn't have any indexes except the primary key so the only thing we can do right now is to scan the whole table. Scans can be filtered which allow us to specify conditions for which rows should be included in the result. Let's say we only want to see orders being shipped to Massachusettes. We could click add filter, and then specify that the field shipping state must equal MA for Massachusettes. If we click start search, then after a moment Amazon will show us the orders that are being shipped to Massachusettes.

As we just discussed, these filters can be very expensive to execute. When you scan the full table, DynamoDB has to read every single record and determine if it meets the criteria for the filter. For a large table this can be very slow. Filters are more useful when you have already narrowed down the data to a relatively small subset, like in a query.

Now let's take a look at queries. We can construct a query here in the console by switching the drop-down on the left from scan to query. The options for a query are different than they were for a scan. In a query, you need to first specify the indexed field value to base the query around. We only have one index, our primary key. So our only option is to query for a single order ID. Let's query for order 672102. We can enter that order ID number into the text box here. And when we click start search we will quickly see the one order with that ID. Because we're using a query and not a filter, DynamoDB can find this record very quickly and uses only one read capacity unit for the one record that was returned.

If queries can only return one record, then why have all these other options like filters and sorts? Well, these are useful when you have a composite primary key. Let's switch over to our order line items table. Just to refresh, this table has a composite primary key where each record is uniquely identified by the combination of its order number and line number. The order number is the partition key or top level of the hierarchy and the line number within each order is the sort key. If we perform a query on the table you can see that we can query by both the partition key and the sort key.

Let's try it now. Let's look for the record with order ID 672102 and line number 3. If we enter those criteria and click start search, we quickly get back a result set with exactly one item, just like we expected. But we don't need to specify both a partition key and a sort key. As long as we narrow down the results to one partition key we can leave out the sort key. Let's take that out and click start search. Now we see nine items returned. In fact, they're sorted by line number. We could switch this to reverse order by changing the sort to descending.

This is also where a filter can come in handy. Let's say we want to find the unshipped items in this order. We could add a filter looking for records where status is unshipped. After we click start search we'll get the results. And we see that there are only two items that haven't been shipped. These kind of filters are not as slow or expensive as a filter on an entire table scan. The database performs this query by loading the 9 records with the partition key that matches 672102. And then filter in the results that are sent back to us. This will take nine read/compute units, but that's still a lot less than a full table scan.

Now because the partition key and sort key represent a hierarchy within your table, we can't query on the sort key alone. Let's see what happens if we remove the filter and remove the order ID and then try and send a query for all the records with line number 3. If we set up a query this way and then click start search, DynamoDB won't allow it because we haven't narrowed down our query to a single partition key.

Now that we've seen how to do basic queries and scans using the web console, the next video will discuss how to do these operations using the API.

About the Author

Ryan is the Storage Operations Manager at Slack, a messaging app for teams. He leads the technical operations for Slack's database and search technologies, which use Amazon Web Services for global reach.

Prior to Slack, Ryan led technical operations at Pinterest, one of the fastest-growing social networks in recent memory, and at Runscope, a debugging and testing service for APIs.

Ryan has spoken about patterns for modern application design at conferences including Amazon Web Services re:Invent and O'Reilly Fluent. He has also been a mentor for companies participating in the 500 Startups incubator.