1. Home
  2. Training Library
  3. Google Cloud Platform
  4. Courses
  5. Data Management on Google Cloud Platform

Managing BigData with Google BigQuery

Start course
Overview
Difficulty
Beginner
Duration
27m
Students
697
Description

Data Management is critical for all server infrastructures - whether cloud-based or earth-bound. Even small applications need to process, manage, and digest large quantities of data, either as files, objects, or items in a database. So it's not surprising that all the major cloud platforms provide multiple data management solutions.

This course, designed and produced by our expert Linux System Administrator David Clinton, offers a great overview of the data solutions available from Google. You will see Google's two managed DBMS, its Object Storage service, and BigQuery - the Big Data service into which Google poured its years of experience performing analytics and queries on massive datasets.

Who should take this course

"Data Management on Google Cloud Platform" is a beginner level course, so it's available without any special prerequisites. Nevertheless, you might benefit by first going through our "Introduction to Google Cloud Platform" course for a more general view of the whole GCP family. Also, some experience with the Linux CLI might be helpful for those videos that include terminal operations.

After this course, you might want to try "Getting Started with Google Compute Engine",to round out your GCP knowledge. And after that, check out our Google quiz questions: Test yourself and increase your understanding at the same time thanks to our exclusive learning technology.

 

Transcript

Hi, and welcome to CloudAcademy.com's video series on Google Cloud Platform Data Management. In this video we'll look at BigQuery, Google's super fast service allowing SQL-like queries against massive data sets.

How is Google BigQuery built?

We work with projects, data sets, tables, and jobs. That is, we work within Google Cloud projects, like the ones we've seen in many other videos using other services. Each project however, can contain data sets, multiple data sets. The data set obviously is where the data you're going to query lives. Within the data sets you create tables, and then you launch jobs which really means queries against the data in these tables. Let's start by using the BigQuery browser interface to create a new data set within our current project. Let's click the down arrow next to exploration which has the name I gave to my project.

Create a BigQuery dataset from a CSV file

Let's create a new data set. We have to give this a system-wide unique name. So data9879 let's say. Let's see if Google accepts that. It seems that it did. Now let's click the down arrow and create a table within the data set. We can skip this part, loading previous jobs, we're going to create something entirely new. We're going to give our new table an ID. Let's call it mytable. Just about the only word you can't use is table, that's reserved.

Click on next. Let's select data. We'll select data in the CSV format. You could also of course use JSON or AppEngine Datastore backup, but we're going to choose data in the CSV format. I found some data in that format which is large enough to make a query useful, and it's good for demonstration purposes. Let's choose a file from my own computer. I'll choose this CSV file, and then click on next. We now need to add the schema. And we want to paste the schema in. Again let's go back to the beginning just to take a look at how it's built.

Firstname is the name of a column in my CSV file. It's the title of a column. And it is, after the colon designated as a string, that is the data type is string. Next we add a comma to distinguish between this first element of the schema and the second element or the second column. And lastname is the next column, it too is a string. Comma companyname is a column name, and it too is a string, and etc for the remaining columns of the data set we're going to use. Let's click on next, and all these configurations work fine with us. Then we'll submit to see how Google handles this data. Seems to have worked.

How to compose a query on Google BigQuery

Let's click on compose query, and I'm going to paste the text of a query. Let's go back to the beginning though. We will select city and lastname. That happens to be those are the titles from the schema of two of the columns. So the city column and the lastname column. So that's what we're selecting from data9879 which is our data set. And .mytable[SP] which is the table within the data set.

We're going to group by city and lastname, and order by city, which will return, hopefully, all the contents of the lastname and city columns. But it will order them alphabetically by city. Let's run the query. It worked. It has ordered them actually, in backwards alphabetical order, starting with York, then Yonkers. And we can move on to the next row of 501 because the browser window is a little bit small so it's only giving us five or six lines per page. We could obviously adjust that. But we have now created a data set, and launched a query against that data set, and successfully returned data in the format we wanted.

Now let's do the same thing from a terminal in my local computer using GCloud or specifically bq, BigQuery, which is part of the Google Cloud SDK. Let's type bq which is again BigQuery, ls list futuregraph718. that is the name of my project, or the project ID of my project actually, and data9879 which is the data set you'll remember we just created. It's there, and it's got a table called mytable. Now let's run bq again show futuregraph718:data9879.mytable. And it returns a schema of the table. Not all the data from the table, but a schema of the table.

Now let's actually run the same query against this data that we ran from the browser. Again, we're running it against project ID futuregraph718 in which we are selecting city and lastname from the data set data9879.mytable, which is the table. And then we're grouping the same way we did using the browser. And it's done, and it's done actually about 20 seconds quicker than the browser returned the data. So we can access and launch queries against our data from both the browser and from any terminal with GCloud running.

About the Author
Avatar
David Clinton
Linux SysAdmin
Students
11905
Courses
12
Learning Paths
4

David taught high school for twenty years, worked as a Linux system administrator for five years, and has been writing since he could hold a crayon between his fingers. His childhood bedroom wall has since been repainted.

Having worked directly with all kinds of technology, David derives great pleasure from completing projects that draw on as many tools from his toolkit as possible.

Besides being a Linux system administrator with a strong focus on virtualization and security tools, David writes technical documentation and user guides, and creates technology training videos.

His favorite technology tool is the one that should be just about ready for release tomorrow. Or Thursday.