This course explores geographic information system (GIS) topics and how to query and analyze GIS data within the database environment of BigQuery GIS. We'll start off by defining what GIS is, discussing some key GIS concepts, and introducing some common GIS data types.
We'll then move on to common types of maps and introduce the concept of map projections. You'll get an introduction to Google's BigQuery tool, and finally, we'll put all these topics together and look at how you can perform analysis and visualization using SQL and Python in conjunction with BigQuery GIS.
If you have a use case analyzing and mapping geospatial data or anticipate one in the future, especially if your data is in a relational format (or already in BigQuery), then this course is ideal for you!
Please feel free to contact us at support@cloudacademy.com with any feedback you may have related to this course.
Learning Objectives
- Learn about Google BigQuery GIS and its concepts, as well as common GIS data formats
- Understand the common types of maps and the concept of map projections
- Learn about spatial query functionality and spatial visualization
- Understand how BigQuery GIS can be integrated with Python
Intended Audience
This course is intended for anyone who wants to:
- Leverage BigQuery GIS for their geospatial analytics needs
- Learn how to visualize data on maps
Prerequisites
To get the most out of this course, you should have basic familiarity with SQL, Python, and cloud computing, ideally Google Cloud Platform.
Now that we've talked about a number of ways that you can analyze data with BigQuery GIS using the special geography data type, as promised, we're gonna move on and talk about how you can visualize this data. And this is pretty exciting because I personally really like data visualization of all types, and creating maps is often super-fun and super-informative, so I'm excited to get into this section.
So there are a number of options for visualization of geographic data. The one that I used for all of the examples in this course is BigQuery Geo Viz, which is a tool within the GCP suite that has a really easy Web UI. It's very simple to take a query from BigQuery from the Web UI for just general BigQuery and essentially copy and paste it into the interface for BigQuery Geo Viz and then create a map from there. It's very cool, and it's very simple.
You can also use the Google Earth engine, or you can use Jupyter notebooks, which you can combine Python and BigQuery GIS to create some really cool stuff. And within Jupyter notebooks, there's a Jupyter Labs extension called the GeoJSON extension, and this allows you to create really quick and easy maps from GeoJSON formatted data, which again, we can get out of BigQuery GIS using one of our formatter functions.
There's also a number of other packages that you can use in Python and Jupyter notebooks, including things like geopandas. For our first visualization example, we're gonna use the airports public dataset, which you may remember from way back at the very beginning of this course, which hopefully doesn't feel like too long ago.
So we're gonna use this data set to visualize all the airports in California using the BigQuery Geo Viz tool. And like I said before, this is a really neat tool because you can just copy your query directly into the Web interface for BigQuery Geo Viz and then visualize the data. The slide here shows the exact query that I used, which took from the FAA dataset in the public data the US underscore airports table. And for this query, we're filtering on just aerodromes, which are the traditional airports, and we're not including the heliports or the seaports, and we're also filtering on the state of California because remember, we said, we're just gonna map the airports in California, and each of the red dots on the map represents an airport, and you can see in the pop-up that there's additional information that we pulled into the query that you can see when you click on a particular airport point.
The airport I selected is the San Jose airport, which I picked because I've been there and because there's a song about knowing the way to San Jose. And you can see within this pop-up that we have the FAA identifier for this airport. It's SJC. We also have the latitude and the longitude and the airport geom information.
So this next example shows how we would incorporate data values into our visualization, and here, we're using the London bike share data from the Google public data sets. And we want to visualize not just where the bike share docking stations are within the city of London, but also how many bikes there are at each docking station.
So on the map, each purple dot, and I used purple because I wanted to show that you have the option to pick different colors. The red was the default for the previous map, but I thought I would change it up for here. So the purple dots represent a docking station, and the size of the dot represents the number of bikes at the docking station, so the larger the dot, the more bikes there are at that docking station, and if you look at the query, you can see that we've done this by using the bike count field within the public data set.
We've also done a bit of filtering here to ensure that we're only mapping docks that still exist by plotting rows that have the removal date as null, which means there's no removal date. They haven't been taken out yet. We're also querying to limit our results to where the bike count is greater than zero because we only care about docking stations that have bikes, because there's nothing worse than walking all the way to a docking station to find that there aren't any actual bikes available.
Now, we're gonna back to an example we used a few slides ago, where we looked at the crimes in Chicago by ZIP code, and we can visualize the number of crimes in 2019 for each ZIP code on a map. The reason that I'm using 2019 and not 2020, which is the most recent year, is because 2020 is not over yet as the time of this recording, and 2019 is a full year of crimes.
So the type of thematic map that is shown on this slide is called a choropleth map, and it's a very useful map, and you've probably seen lots of maps like this before and probably just didn't know what it was called. So the color scale here is continuous from bright green, which represents the fewest number of crimes per ZIP code, to bright red, which represents the most crimes, which are over 12,000 crimes committed in one year in that ZIP code, and this map allows you to clearly see the crime hotspots in the city of Chicago.
Our last map example is also one we've seen before, and this goes back to our tree clusters in San Francisco that we had a few slides ago, and here we're representing our clusters on a map. So for this map, every tree is color-coded, and each color represents a different cluster of trees, but unlike on the last map, where the colors represented the values, where the bright green was low values and the bright red were high values, these colors don't have any significance beyond visually grouping clusters together.
So as you can see here, some of the clusters have a nice straight line, while others are a bit more freeform, but you can see that we've clustered together trees that are in the same basic geographic area, and this is because if you remember, we specified that we wanted to create clusters of trees within 100 meter radius.
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.