Integrating BigQuery with Python
Start course
Difficulty
Intermediate
Duration
59m
Students
178
Ratings
4/5
starstarstarstarstar-border
Description

This course explores geographic information system (GIS) topics and how to query and analyze GIS data within the database environment of BigQuery GIS. We'll start off by defining what GIS is, discussing some key GIS concepts, and introducing some common GIS data types.

We'll then move on to common types of maps and introduce the concept of map projections. You'll get an introduction to Google's BigQuery tool, and finally, we'll put all these topics together and look at how you can perform analysis and visualization using SQL and Python in conjunction with BigQuery GIS.

If you have a use case analyzing and mapping geospatial data or anticipate one in the future, especially if your data is in a relational format (or already in BigQuery), then this course is ideal for you!

Please feel free to contact us at support@cloudacademy.com with any feedback you may have related to this course. 

Learning Objectives

  • Learn about Google BigQuery GIS and its concepts, as well as common GIS data formats
  • Understand the common types of maps and the concept of map projections
  • Learn about spatial query functionality and spatial visualization
  • Understand how BigQuery GIS can be integrated with Python

Intended Audience

This course is intended for anyone who wants to:

  • Leverage BigQuery GIS for their geospatial analytics needs
  • Learn how to visualize data on maps

Prerequisites

To get the most out of this course, you should have basic familiarity with SQL, Python, and cloud computing, ideally Google Cloud Platform.

Transcript

So finally, we're going to talk about how you can take BigQuery and integrate it with the Python programming language. What's nice is that there are several libraries that already exist that allow you to integrate Python with BigQuery and BigQuery GIS using either Python scripts or Jupyter Notebooks.

So there's the BigQuery client library and then the BigQuery Storage API client library. The client library actually allows you to write queries within Python to access and manipulate BigQuery data. And then the Storage API client library gives access to BigQuery managed storage which is helpful for large datasets.

When you use these libraries to pull BigQuery data into Python, it stores your query results into a Pandas dataframe. And then you can use Python to do further analysis on that dataframe just as you would if you import it into a Pandas dataframe from any other source such as a CSV or a spreadsheet file.

One note is that these client libraries currently only support Python three. They're no longer supporting Python two. So if you're still using Python two there are some legacy libraries available but it might be a good time to consider switching over to Python three.

For those of you who don't know, Jupyter Notebooks allows you to write interactive Python code where you write the Python code in code cells and then you see the output right under the code cells. And actually, you can use various different kernels for Jupyter Notebooks.

So while Python is the most popular you can use our other kernels within Jupyter Notebooks. And in addition, instead of code cells, you can also set the cells to be marked down cells so you can annotate your code and make a nice report as you're typing out your interactive code.

So what's nice about integrating BigQuery into Jupyter Notebooks is that there's actually a magic command which allows you to type percent sign percent sign BigQuery as the first line in the code cell. And once you've done that you can just directly type SQL code and run SQL queries. You don't have to worry about any syntax related to the API. You can just focus on creating your SQL code.

Another nice thing is that you can use a tag as after this magic command, which will allow you to reference the results of the SQL query in that code cell later in the code. In the example on this slide, the tag is la_zips which saves the query output to a variable of that name. And as I mentioned earlier this variable is a Pandas dataframe.

So that about covers the main course content. Before we wrap things up, let's do a quick recap. First, we learned about how to represent GIS data both as coordinates and visually on maps. We saw examples of many different types of maps and talked about the different ways to project 3D coordinates onto 2D maps.

Next, we discussed Google's BigQuery serverless relational database tool. And finally, we tied everything together with BigQuery GIS. We looked at examples of spatial functions built into BigQuery GIS, such as Spatial measurements, Spatial join, Spatial transformations, and even clustering of geospatial data. We also showed how we can visualize our spatial analysis examples on maps with BigQuery geobits and last but not least we discussed how to connect BigQuery GIS and Python.

Woo, that was a lot of material. We didn't have time to cover everything in depth. So I've included some links where you can find additional resources on many of the topics covered in this course, including Google's official documentation on both BigQuery and BigQuery GIS Spatial Functions:

As always, please feel free to send feedback and comments. The feedback that we get from you inspires case studies and helps us make future courses even better. So it's really good to hear from you. Thank you so much for your time and attention. I hope you learned a little something today about geospatial analysis, BigQuery GIS, and map visualization.

About the Author
Students
22699
Labs
31
Courses
13
Learning Paths
35

Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity.  With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing  decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.