Getting Data In
Getting Data Out
The course is part of these learning paths
BigQuery is Google’s managed data warehouse in the cloud. BigQuery is incredibly fast. It can scan billions of rows in seconds. It’s also surprisingly inexpensive and easy to use. Querying terabytes of data costs only pennies and you only pay for what you use since there are no up-front costs.
This is a hands-on course where you can follow along with the demos using your own Google Cloud account or a trial account. You do not need any prior knowledge of Google Cloud Platform and the only prerequisite is having some experience with databases.
- Load data into BigQuery using files or by streaming one record at a time
- Run a query using standard SQL and save your results to a table
- Export data from BigQuery using Google Cloud Storage
- Anyone who is interested in analyzing data on Google Cloud Platform
- Experience with databases
- Familiarity with writing queries using SQL is recommended
- A Google Cloud Platform account is recommended (sign up for a free trial at https://cloud.google.com/free/ if you don’t have an account)
The GitHub repository for this course is at https://github.com/cloudacademy/bigquery-intro.
I hope you enjoyed learning how to use BigQuery. Now you know how to load data, run queries and save the results, stream data one record at a time, and export data. Let’s do a quick review of what you learned.
BigQuery’s advantages over on-premises databases are ease of implementation and speed.
If you want to import data that’s already in another Google service, then there’s usually a way to get it into BigQuery, although sometimes it requires an intermediate step. It’s also possible to query data in certain Google services without importing it into BigQuery. However, the performance is usually slower when you query an external data source than if the data resides in BigQuery storage.
If your data isn’t in a Google service, then you can upload it to BigQuery through the web interface, the command line, or the API. One limitation of the web interface is that it can only upload files that are 10 megabytes or less in size. The command-line tool for all BigQuery operations is “bq”.
When you’re uploading data, BigQuery includes an option to automatically detect its schema, but it doesn’t always work, so you may have to specify the schema manually.
Another way to get data into BigQuery is streaming, where you add data one record at a time instead of a whole table at a time. This is most useful for real-time applications. Although there’s no cost to upload data to BigQuery in bulk, it does cost money to stream data into BigQuery. To add streaming code to your applications, it’s easiest to use Google’s BigQuery client libraries, which are available for many different languages.
BigQuery stores data in tables, and each table must be part of a dataset. You can’t rename a dataset in BigQuery.
When you run a query, if you don’t specify a destination table, it puts the results in a temporary table. This temporary table stays in cache for about a day. So if you run the query again within 24 hours, it’ll retrieve the cached copy, and you won’t be charged for the query.
Cloud Storage is the only place where you can export data from BigQuery. If you need to export more than one gig of data, then you have to shard the data into multiple files by including an asterisk in the destination filename. You can also use an asterisk when you’re uploading files.
To learn more about BigQuery, you can read Google’s online documentation. You can also try one of the other BigQuery courses on Cloud Academy.
Please give this course a rating, and if you have any questions or comments, please let us know. Thanks!
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).