1. Home
  2. Training Library
  3. Google Cloud Platform
  4. Courses
  5. Spatial Analysis and Visualization with BigQuery GIS

BigQuery GIS: Working with Geography Data

Contents

keyboard_tab
Start course
Overview
Difficulty
Intermediate
Duration
59m
Students
68
Ratings
3/5
starstarstarstar-borderstar-border
Description

This course explores geographic information system (GIS) topics and how to query and analyze GIS data within the database environment of BigQuery GIS. We'll start off by defining what GIS is, discussing some key GIS concepts, and introducing some common GIS data types.

We'll then move on to common types of maps and introduce the concept of map projections. You'll get an introduction to Google's BigQuery tool, and finally, we'll put all these topics together and look at how you can perform analysis and visualization using SQL and Python in conjunction with BigQuery GIS.

If you have a use case analyzing and mapping geospatial data or anticipate one in the future, especially if your data is in a relational format (or already in BigQuery), then this course is ideal for you!

Please feel free to contact us at support@cloudacademy.com with any feedback you may have related to this course. 

Learning Objectives

  • Learn about Google BigQuery GIS and its concepts, as well as common GIS data formats
  • Understand the common types of maps and the concept of map projections
  • Learn about spatial query functionality and spatial visualization
  • Understand how BigQuery GIS can be integrated with Python

Intended Audience

This course is intended for anyone who wants to:

  • Leverage BigQuery GIS for their geospatial analytics needs
  • Learn how to visualize data on maps

Prerequisites

To get the most out of this course, you should have basic familiarity with SQL, Python, and cloud computing, ideally Google Cloud Platform.

Transcript

So now we can put it all together. We've talked about how to represent geospatial information, both as data and visually as maps. We've talked about BigQuery, and now we're gonna put it together and talk about BigQuery GIS.

What is special about BigQuery GIS is that it allows you to store vector geospatial data as a special geography data type. This geography data type is stored as a well known text or WKT object, and data can be converted into the geography type from several geospatial formats, including GeoJSON, well known text, well known binary or WKB or latitude longitude coordinate values.

BigQuery GIS supports points, lines, and polygons, as well as multiple points, multiple lines, or multiple polygons in a single geometry object. To make multiple points, multiple lines or multiple polygons a little more concrete, what we're talking about here is multiple geospatial objects in a single data field.

So, for example, let's say you wanted to map all of the gas stations in a particular ZIP code. You could have a database that has one line per gas station, so each geometry object would be a single point. Or you could have a table in that database that has one line per type of gas station. So, for example, Hess or BP, and each of those gas station types could actually have a multipoint, which has, in a single geometry object, the geospatial coordinates for every gas station of that type. So hopefully that makes, this make a little more sense.

In addition, BigQuery supports geometry collections. And remember, we talked a little bit about geometry collections when we were talking about GeoJSON formats and geometry collections are groups of geometry points grouped together. Unfortunately, Big Query GIS does not currently support 3D geometries, which means that you can't include well known text with elevation values, so no above or below the reference data. It also doesn't support specifying the linear reference system, but that's not really something that you should have to worry about.

Now that we understand the geography data type, let's look at an example of how you would create a geography data type field from raw geospatial data. This example uses the latitude longitude values of the Empire State Building, which has a longitude of negative 73.9857 and a latitude of 40.7484. The negative in the longitude represents west of the prime meridian and a positive latitude represents north of the equator. You should see these values in both the query editor and also in the results pane in the screenshot of the Web UI from BigQuery.

What I've done here is created a dataset within BigQuery called GIS example. And within that dataset, I've created a table called NYC Landmarks. In the query at the top of the query editor, loads values in a variety of different geospatial formats into the NYC landmarks table using an insert into statement. First just the latitude and longitude into separate columns as floating point values. Then these values combined into well known text format, and finally, these values represented as GeoJSON format.

The second query in the query editor uses functions in the select clause that allow me to convert from these three different types of data formats into the special geography data type within BigQuery GIS. So there's the ST_GEOG point function, which converts to a geography data type from a latitude-longitude pair.

The ST_GEOG from text function, which converts from a well known text and the ST_GEOFROMGEOJSON, which converts from the GeoJSON format. And as you can see in the results, all three of these give the same geography value, which is represented in that well known text format.

Now that we've seen some of these functions in action, let's take a step back and talk a little bit about constructors versus parsers. Constructors will build a new geography data type from coordinates or existing geography data, for example, a longitude latitude pair of coordinates.

So the ST_GEOFROMPOINT function that we saw in the last slide is a constructor function. Another constructor allows you to make a geography data type of line, or line string as they call it in BigQuery GIS from either a pair of geography points representing each endpoint of the line or an array of geography points representing the vertices of the line.

Finally, there are constructors for making polygons, or oriented polygons from line geography inputs, which are constructed to create a closed polygon ring. On the other hand, parsers take other geospatial data formats and simply convert them to BigQuery's special geography data type.

So in our previous example, the ST_GEOGFROMTEXT and the ST_GEOFROMGEOJSON functions were parser functions. While on the surface, there is not much difference between constructors and parsers for point geography data, there is a bigger difference for line and polygon data.

Parsers take the line or polygon data as is without needing to build these geography objects from the smaller constituent parts. Now that we understand constructors and parsers, let's look at a couple of examples that use these functions.

The first example will use the ST_MAKELINE constructor function to create a LineString geography object. This line represents the route of the Amtrak Northeast Corridor, which runs from Boston, Massachusetts to Washington, DC. I chose this because this is a route that I've traveled many times. I like train travel, and living in New York, I'm right in the middle so I can go north or south. So I thought this would be a bit of an interesting example and a little bit of fun.

I created a BigQuery table called Amtrak Stations in my existing GIS example dataset. Remember, we used that dataset for the NYC landmarks example a few slides ago. And in that table, I've listed 10 stations along the Northeast Corridor route with their corresponding latitude and longitude coordinates.

Once I had the coordinates for the stations in my table, I used an update query to populate the STATION_GEOMFIELD, which is of the BigQuery geography data type. The values for this field were created using the ST_GEOGPOINTCONSTRUCTOR function, which generates point geography objects for each station.

You can see the final table on this slide, which shows for each station, the station name, the station city, and the station state, the latitude and longitude values, as well as the geography value which is in the STATION_GEOMFIELD. And then the ID represents the order in which the stations occur on the Amtrak route. So the northernmost station, which is Boston South Station is ID number one and the southernmost station, which is Washington's Union Station is ID number 10.

Finally, we can pass these 10 geography point objects as an array to the ST_MAKELINE constructor, which will create a single geography LineString object. You can see this line visualized on the map stretching from Boston to Washington and hitting the eight stations in between. We'll be talking much more about visualizing these geography objects later in the course. But I thought it was good to give a little taste now, for you to be able to picture LineString objects.

You can see the SQL code that I used to create this geography line object. I used a subquery to select my STATION_GEOMFIELD from my Amtrak stations table, which remember, was my point geographies for each of my Amtrak stations. And then I turned that into an array using SQL's array function and fed that entire thing into the ST_MAKELINE constructor function.

For this next example, instead of using a constructor function, we're gonna use a parser function to create a polygon geography object. The coordinates of this polygon are in GeoJSON format, and this format is converted to the geography data type by using the ST_GEOGFROMGEOJSON function. The data for this example comes from Natural Earth, which, as I mentioned earlier, is a great source of GIS data on the web.

The polygon shown here is the outline of the country of Switzerland. Like with our previous example, we're showing the SQL code we used to create this polygon geography object, and here we're just feeding the GeoJSON text as is as the argument into the ST_GEOGFROMGEOJSON parser function.

The next class of geography functions is Formatter functions. These functions allow you to take a BigQuery geography data type and export it to other data types so you can use them within other programs. And this is essentially the reverse of the parser function.

In the parser function, we take a non-native geography data type or non-native geospatial data type and turn it into the geography data type. The Formatter function takes the geography data type and turns it into one of those other geography or geospatial data types. The export types include text, which would be the well known text format that we've seen before, GeoJSON, Binary, which is the well known binary or WKB and GeoHash.

So, as an example, the ST_ASGEOJSON function which is a Formatter function, will take your BigQuery GIS special geography data type and export it as a GeoJSON. So then it can be read by any other language or program that can parse GeoJSON files. So one use for this is to do visualization with that GeoJSON for visualization tools that read GeoJSON as their input.

Next, we're gonna talk about spatial accessors. These functions return properties of the BigQuery GIS geography data type. So the ST_DIMENSION, ST_ISCOLLECTION, ST_ISEMPTY, and ST_NUMPOINTS functions return metadata about the geography object. Metadata is data about data, in case you weren't aware of that. The remaining functions in this category separate out combined geography data into their component parts.

So, for example, the ST_X and ST_Y functions break out the longitude and latitude respectively from a geography point object. This next slide shows an example of using the ST_DIMENSION dimension function, which as I mentioned earlier, returns metadata about a geography object. This function will tell you what type of geometry object you have. It will return a zero if you have a point value which is a latitude longitude coordinate pair. It will return a one if you have a line, maybe representing a road or river, or the Amtrak train route we saw in the example earlier.

If your geography field is a polygon data type, such as the Switzerland example we just had, the ST underscore dimension function will return a value of two. The data we're using for this example is the zip codes table from the Google public datasets. It contains geographic information on all zip codes in the US and can be found in the GEO_US_BOUNDARIES dataset.

In the top portion of the Web UI you can see a query that uses the ST_DIMENSION function twice with two different arguments. The first is the INTERNAL_ POINT_ GEOM field, which is a single coordinate pair representing the centroid of the zip code. As we would expect, the ST_DIMENSION function returns the value of zero indicating a point data type.

The second field we use as an argument to the ST_DIMENSION function is the ZIP_CODE_ GEOMFIELD, which represents the boundaries of the specified zip code. And again, as expected, we get a value of two from the ST_ DIMENSION function indicating a polygon data type.

For our next example, let's go back to the New York City landmarks table we were building. We now have two landmarks in our table, the Empire State Building, which we inserted earlier, and now the Statue of Liberty. The landmark_Geo fields contains point geography values for each landmark, and we can use the ST_X and ST_Y functions to separate out the longitude and latitude values respectively. Along a similar vein, you can also use the ST_ DUMP function, which separates the geography components for more complex data types such as the vertices of a polygon data type.

About the Author
Avatar
Calculated Systems
Training Provider
Students
5734
Labs
31
Courses
13
Learning Paths
17

Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity.  With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing  decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.