Getting Data In
Getting Data Out
BigQuery is Google’s managed data warehouse in the cloud. BigQuery is incredibly fast. It can scan billions of rows in seconds. It’s also surprisingly inexpensive and easy to use. Querying terabytes of data costs only pennies and you only pay for what you use since there are no up-front costs.
This is a hands-on course where you can follow along with the demos using your own Google Cloud account or a trial account. You do not need any prior knowledge of Google Cloud Platform and the only prerequisite is having some experience with databases.
- Load data into BigQuery using files or by streaming one record at a time
- Run a query using standard SQL and save your results to a table
- Export data from BigQuery using Google Cloud Storage
- Anyone who is interested in analyzing data on Google Cloud Platform
- Experience with databases
- Familiarity with writing queries using SQL is recommended
- A Google Cloud Platform account is recommended (sign up for a free trial at https://cloud.google.com/free/ if you don’t have an account)
The GitHub repository for this course is at https://github.com/cloudacademy/bigquery-intro.
To open BigQuery, go to the Google Cloud Platform console, then find BigQuery in the menu. Alternatively, you can type “bigquery” in the search bar, which is probably easier.
Suppose you wanted to see which US state had the most babies with the same name in one year. There’s a public dataset with baby name data available on BigQuery. If you look under bigquery-public-data, you’ll see one called “usa_names”. If you click on it, you’ll see two tables that are almost the same. We’ll use the first one.
When you click on the table, it brings up the schema. If you click on the Details tab, it’ll show you a description of the data in the table. Let’s make this bigger. If you click on the Preview tab, it’ll give you a sample of the data. If you click the “Query Table” button, it will even give you the skeleton of a SQL query.
However, the query that it put in isn’t complete. You can tell because the Validator circle at the bottom right is a red exclamation point, which means there’s a problem. To see why, click on it and open the Validator. It says the “SELECT list must not be empty”. Let’s “SELECT *” from the table, which, if you’re not familiar with SQL, means select everything. Now the exclamation point has turned into a green check mark, so it’s a proper query.
We should sort the results with the biggest number at the top, so use “ORDER BY number DESC” (for “descending”), and then, since we only need to see the top results, let’s put in a LIMIT of 10. This line is quite long now, so if you want to make it easier to read, select the “Format” option. That’s better.
Now click the “Run” button. It only takes a few seconds to run.
The top result is Robert in New York in 1947, with 10,025 occurrences. You might be wondering if we did something wrong with this query because all of the top 10 names are boy’s names. Let’s look only for girl’s names and see what happens. Add “WHERE gender = 'F'”. Remember to put quotes around the F.
Now it makes sense why we only saw boys' names before. The highest number of occurrences for a girl’s name was “Mary” in Pennsylvania in 1918, with 8,184 occurrences. Although that’s a lot of Marys, there were 9,054 Roberts in New York in 1951 and that was the 10th highest number of occurrences, so no girl’s names showed up in the top 10.
Before we run any more queries, let’s see how much this is costing us. I’ll cover that in the next lesson.
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).