image
Querying Secondary Indexes
Start course
Difficulty
Intermediate
Duration
1h 32m
Students
20862
Ratings
4.6/5
starstarstarstarstar-half
Description

This course provides an introduction to working with Amazon DynamoDB, a fully-managed NoSQL database service provided by Amazon Web Services. We begin with a description of DynamoDB and compare it to other database platforms. The course continues by walking you through designing tables, and reading and writing data, which is somewhat different than other databases you may be familiar with. We conclude with more advanced topics including secondary indexes and how DynamoDB handles very large tables.

Course Objectives

You will gain the following skills by completing this course:

  • How to create DynamoDB tables.
  • How to read and write data.
  • How to use queries and scans.
  • How to create and query secondary indexes.
  • How to work with large tables. 

Intended Audience

You should take this course if you have:

  • An understanding of basic AWS technical fundamentals.
  • Awareness of basic database concepts, such as tables, rows, indexes, and queries.
  • A basic understanding of computer programming. The course includes some programming examples in Python.

Prerequisites 

See the Intended Audience section.

This Course Includes

  • Expert-guided lectures about Amazon DynamoDB.
  • 1 hour and 31 minutes of high-definition video. 
  • Expert-level instruction from an industry veteran. 

What You'll Learn

Video Lecture What You'll Learn
DynamoDB Basics A basic and foundational overview of DynamoDB.
Creating DynamoDB Tables How to create DynamoDB tables and understand key concepts.
Reading and Writing Data How to use the AWS Console and API to read and write data.
Queries and Scans How to use queries and scans with the AWS Console and API.
Secondary Indexes How to work with Secondary Indexes.
Working with Large Tables How to use partitioning in large tables.

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.

Transcript

Now that we've seen how to create indexes, let's try querying those indexes. We'll do it first from the web console, and then from code using the API.

We'll start with our Orders table and see what we can do with the secondary index there. That index was on Customer ID that was supposed to allow us to find all the orders for a given customer. To work with the index, let's go to the Items tab. Right now, we see a scan of the entire table with all of its data visible to us. Let's change the drop-down and say we want to do a scan of Customer ID index. This will show us all of the data in the index. We'll need to click Start Search to switch to this view.

You can see that once we're viewing the Customer ID index, the customer ID and order date attributes are more prominent. They've been moved to the left here. All of the rest of the attributes in the table are available in this index as well, because we chose to project all of the table's attributes into the index when we created it.

Let's now query this index for the orders that match a specific customer. We'll switch from Scan to Query, and then we'll enter a customer ID of 49921. We don't need to specify an exact order date, since we want to see all of the orders for this customer. When we click Start Search, DynamoDB quickly returns two orders for this customer. They're sorted in ascending order by order date. The March 9 order comes before March 28. If we wanted to reverse that, we could've just sorted the query in descending order.

Because we used the index, DynamoDB only had to read two items, so it only used two read capacity units. If we had done a table scan, then filtered on customer ID, then DynamoDB would've had to scan through all of the items in the table, which would've cost us hundreds of read units.

Let's now try the same thing on the Line Items table. We can click on Order Line Items table in the left. When we open the drop-down here, we see three options. We can scan the full table, scan our Product ID index, which is the global secondary index, or scan our local secondary index, Status index. So let's say we wanted to find all of the orders that contained a particular product. We can select Query and pick Product ID index. We can enter a product ID, like 90805, and click Start Search. Ah-ha. There are seven line items that reference that product ID. You can see the shipped items appear above the unshipped items, but that we can see both of those in this view.

We could also provide a specific status value in the query. If we type "Unshipped" into the status field, and then click Start Search again, this time it only returns the two records that match both the product ID and the status of Unshipped. It uses only two read capacity units, since everything was done through the index structure.

Now, let's run the same queries and code using the Python environment that we worked with earlier. We'll start with a little boilerplate, just like before, and we'll create a Table object so that we can work with the Order Line Items table. Now we can do a full scan of the Product ID index by passing the index name as one of the parameters to the Scan API call. We've added the index name parameter to the Scan call, but otherwise it's just like the scans we did earlier. Let's run it and see what it returns. There are 722 items here. That's a lot of data. A full scan of the index isn't that useful, so let's do a query instead. Just like we did in the console, let's query for all the items with product ID 90805. This looks just like our earlier queries with a key condition expression, but now we've added an index name to it. When we run it, we see seven items that match the product ID, the same as we saw in the console. Just like we did in the console, we could also query for both a product ID and a shipping status. When we run that, we see two unshipped items just like we expected.

So that's all there is to it. Every scan and query option that was available for the main table you can also do with an index just by adding the index name parameter to the API request.

About the Author

Ryan is the Storage Operations Manager at Slack, a messaging app for teams. He leads the technical operations for Slack's database and search technologies, which use Amazon Web Services for global reach.

Prior to Slack, Ryan led technical operations at Pinterest, one of the fastest-growing social networks in recent memory, and at Runscope, a debugging and testing service for APIs.

Ryan has spoken about patterns for modern application design at conferences including Amazon Web Services re:Invent and O'Reilly Fluent. He has also been a mentor for companies participating in the 500 Startups incubator.