Getting Data In
Getting Data Out
The course is part of these learning paths
BigQuery is Google’s managed data warehouse in the cloud. BigQuery is incredibly fast. It can scan billions of rows in seconds. It’s also surprisingly inexpensive and easy to use. Querying terabytes of data costs only pennies and you only pay for what you use since there are no up-front costs.
This is a hands-on course where you can follow along with the demos using your own Google Cloud account or a trial account. You do not need any prior knowledge of Google Cloud Platform and the only pre-requisite is having some experience with databases.
- Load data into BigQuery using files or by streaming one record at a time
- Run a query using standard SQL and save your results to a table
- Export data from BigQuery using Google Cloud Storage
So far we've run queries on public data sets and on pre-existing data that we loaded into BigQuery. But there's another way to get data into BigQuery: streaming, where you add data one record at a time, instead of a whole table at a time. One example is if you need a real time dashboard that gives you an up-to-the-minute, or even up-to-the-second, view of your data.
One common way to stream data into BigQuery is to use cloud data flow, but that's a subject of another course. You can stream data directly into BigQuery by calling the API in you software. I'll show you an example using the API explorer, which lets you make an API call from a nice interface in your browser.
Before we do that, we need to create a table for the data to stream into. We should also create a new data set because we're not going to put public data in the table. So the public data set isn't the right place to put it. Let's call it streaming. Now create the table. This time we're going to select create empty table because we're not loading any data into it right now. Let's call the table stream.
For the schema, we can put in pretty much anything, since we're just going to stream some test data. But let's have at least two fields. Create a string field called greeting and a string field called target and click create table.
Okay, now go into the API explorer so we can stream some data into that table. Just so you don't have to type in a long URL, I'll show you how to get there. Click the question mark in the top, right corner and then click query reference. We're in the BigQuery documentation. Click on REST reference, then v2, then table data, then insert all.
On the right-hand side is a form you can to fill out to call the API. First put in your project ID, which will be different from mine, of course. There are a couple of ways to get your project ID. In BigQuery, if you put your mouse over the project name, it will tell you the project ID. Since that's quite a bit to type, if you have a Google cloud console open, you can go to the homepage and copy the project ID from there. Then put in streaming for the data set ID and stream for the table ID.
Now in the request body text box, click the plus sign and select rows. This saves you from having to type in everything yourself. It's especially helpful with the brackets because we're going to have quite a few nested brackets in it. Now click the plus sign under rows and select add item. It put in a couple more brackets. Then click the plus sign between those and select json. Now click the plus sign under json and select the asterisk.
Now we finally need to type something. This is where we say what data we want to put in the row we're adding to the table. If you'll recall, the first field is called greeting, so type that between the quotes. Then put your cursor after the colon and type "hello." Click the plus sign under greeting and select asterisk. Notice that it added a comma at the end of the first line, since there's now a second line. We call the second field target, so type target between the quotes. Then after the colon, type world in quotes.
Alright, that's a complete record, so the request body is complete. Click the execute button. It asks you to sign in, so select your account. Then it says that the Google API's explorer needs these three permissions to do what you want it to do. So click allow. Okay, the return code is 200 and it's colored green, which means the API call was successful. It also didn't return and error, so it looks like it worked. Let's see.
Go back to the BigQuery page and click on the stream table. Click the preview tab. It doesn't show any records. Maybe we need to refresh the page. That didn't work either. Well, it actually takes quite a while before streaming data shows up in the preview tab in the web interface, but there are other ways to verify that it's there. The easiest way is to run a query. Click compose query, then click the query table button. It helpfully fills in a basic query. All you have to do is type an asterisk after select and you're good to go. Now run the query. It warns you that you'll be billed for all the data in the table. That's not a concern because there's only one record in the table anyway. So click run query. Great, there's our hello world record.
Although that was a quick way to show you how streaming works, you likely won't be calling the API directly when you write your own streaming code. Instead, you should use the BigQuery client libraries.
Google provides client libraries for C#, Go, Java, Node.js, PHP, Python, and Ruby. There's a different way of doing a streaming insert for each language, but it's easier than writing the API call yourself. If you need to write code to stream into BigQuery, have look at the documentation for your language.
One more thing, remember in the pricing lesson when I said that there's a separate charge for streaming? Well, it's five cents per gig to do streaming inserts. That actually makes it the most expensive BigQuery operation. Loading data in any other way is free. Querying data costs half a cent per gig and storing data costs two cents pre gig, at most. But if you need to do up-to-the-minute analysis of streaming data, then five cents a gig is still pretty cheap.
That's it for this lesson.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).