Getting Data In
Getting Data Out
The course is part of these learning paths
BigQuery is Google’s managed data warehouse in the cloud. BigQuery is incredibly fast. It can scan billions of rows in seconds. It’s also surprisingly inexpensive and easy to use. Querying terabytes of data costs only pennies and you only pay for what you use since there are no up-front costs.
This is a hands-on course where you can follow along with the demos using your own Google Cloud account or a trial account. You do not need any prior knowledge of Google Cloud Platform and the only pre-requisite is having some experience with databases.
- Load data into BigQuery using files or by streaming one record at a time
- Run a query using standard SQL and save your results to a table
- Export data from BigQuery using Google Cloud Storage
Now that you know how inexpensive it is to store data on BigQuery, you may want to save some of your query results there too.
It's very easy to do that. First, though, since you don't have a dataset to put tables in, you'll need to create one. Click the dropdown arrow to the right of your project name and then select Create New Dataset. You need to give it a name, so call it babynames. That's the only thing you have to specify in this dialogue box; but if you want to, you can set the location to either the US or EU, which you might want to do for compliance reasons. You can also set an expiration time; so that a certain number of days after a table has been created, it will be automatically deleted. You would do this if it's temporary data that you don't want to have to remember to delete later. Now click Okay and you should see your new dataset show up under your project name.
Great; now we're ready to run a query. Let's open one of our previous queries so we don't have to type a new one in. If the list of your previous queries isn't already showing, then click Query History. Click the Open Query button to the right of your first query to bring it up again.
Now before you rerun this query, click the Show Options button and then click the Select Table button next to Destination Table. The project and dataset names should already be set to what you want, so you just need to type a name for the table. If the table doesn't already exist, it'll create it. So you can type in whatever table name you want. Let's call it babynames_top10. Click Okay and then click Run Query.
Now if you click on the babynames dataset, you'll see that it created the babynames_top10 table. Click the Hide Options button so there's more room on the page. Now click the Preview tab, and you'll see the results from the query.
What if you decide to save the results to a table after you've run the query? That's easy too. Click Query History and open up the first query again. We didn't specify that the results should be saved to a table when we ran this query the first time, but you'll notice there's a Save As Table button. Click that and put in a different table name.
One more thing to be aware of is that if you don't specify a destination table, like when we ran this query the first time, it puts the results in a temporary table in cache. This temporary table stays in cache for about a day. So if you run the query again within 24 hours, it'll retrieve the cached copy; and you won't be charged for the query.
However, if you run a query again and specify a destination table, like we just did, then it won't read the data from cache. So let's run it again without the destination table option. You can just click the X here instead of having to open the Options pane. Now click Run Query.
Notice that it says cached in brackets. Also, the elapsed time was faster because it just retrieved the results from cache rather than running the actual query. Of course, since it was already showing the results even before we ran the query again; there may not seem to be a lot of point in rerunning it. That's probably true if you're using the web interface, but if you're using the BQ command, or the BigQuery API, then it might be useful in some cases.
Before we go, let's get rid of the tables we created. Put your mouse pointer over the dataset name and a down arrow will appear. Click that and select Delete Dataset. This will delete the dataset and all of the tables in it, so you have to type the name of the dataset before it will delete it.
Okay, that's it for this lesson.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).