Spatial Analysis and Visualization with BigQuery GIS
This course explores geographic information system (GIS) topics and how to query and analyze GIS data within the database environment of BigQuery GIS. We'll start off by defining what GIS is, discussing some key GIS concepts, and introducing some common GIS data types.
We'll then move on to common types of maps and introduce the concept of map projections. You'll get an introduction to Google's BigQuery tool, and finally, we'll put all these topics together and look at how you can perform analysis and visualization using SQL and Python in conjunction with BigQuery GIS.
If you have a use case analyzing and mapping geospatial data or anticipate one in the future, especially if your data is in a relational format (or already in BigQuery), then this course is ideal for you!
Please feel free to contact us at firstname.lastname@example.org with any feedback you may have related to this course.
- Learn about Google BigQuery GIS and its concepts, as well as common GIS data formats
- Understand the common types of maps and the concept of map projections
- Learn about spatial query functionality and spatial visualization
- Understand how BigQuery GIS can be integrated with Python
This course is intended for anyone who wants to:
- Leverage BigQuery GIS for their geospatial analytics needs
- Learn how to visualize data on maps
To get the most out of this course, you should have basic familiarity with SQL, Python, and cloud computing, ideally Google Cloud Platform.
Let's dive right into the course content. Before we start talking about the technical details of how we're going to handle geographic data within the big query database and how we're gonna start manipulating that data, it's really important to understand what we're shooting for and what we're trying to get at in terms of what we're trying to represent when we talk about geospatial data.
So the first thing we're gonna talk about is how you can specify a point on earth in geographic three-dimensional space as data. And this means that we're taking one point, so say the location of your house, and we're gonna represent that as data.
So, how are you gonna do this? You can fully represent a geographic point with just three pieces of information. The first is latitude, which is your distance North or South of the equator. The next is longitude, which is your distance East or West of the prime meridian. And the final piece is your vertical elevation, which is your elevation with respect to a reference datum.
So a reference datum is just a reference plane of zero elevation. Often you'll hear sea level used as an informal reference datum with elevation given as either an amount above or below sea level. But sea level can vary for example, based on tides.
So the scientific community has established control datums to use instead. An example of one of these control datums is the NAV88. But the NAV88 or any other control datum is a very similar concept to sea level. Basically, you have a baseline and then you're telling us how much above or below that baseline your point is to precisely place it in space.
So, now that we understand how to represent geographic information using coordinates, how can we use this information? Well, that's where GIS comes in. GIS stands for Geographic Information Systems, which are data systems that allow for the storing and analysis of geospatial data.
There are two types of GIS data. The first type which is what we're mostly gonna be talking about in this presentation is vector data. And within vector data, there are three types of geospatial objects that you use to represent the vector data. The first is point data, which is a single geospatial point on a map. This includes the latitude and longitude values and can also include your vertical position or your elevation as well.
The next is line data. For example, this is roads or rivers, or as we'll see later in the presentation, train routes. And anything that is a two-dimensional line represented on a map can be represented as line or as they call it in BigQuery GIS, line stream data. And then you have polygon data, which represents an area on a map.
So for example, postal code boundaries, state boundaries, maybe the boundaries of a park or property, those would all be represented by polygon data. The other type of GIS data that you'll see is raster data. Raster data is grid-based data made up of cells or pixels of values.
So for example, you would see raster data with data values such as elevation or soil type or population density where you have a value at each point on the grid. And then you can represent these on a map usually indicated by different colors. This is an example of a map created using vector data.
Polygon vector data describes the land outlines, and hopefully, you recognize some of these geographic features, Italy in particular. And the major physical features such as Mount Etna and Mount Olympus are indicated using point vector data. In addition, line vector data represents rivers within the land boundaries on this map.
This next image shows several examples of maps using raster data. These maps show features such as land cover, shaded elevation, ocean water, and water drainage, which includes lake features in the water drainage map. These maps show continuous raster data which allows grid cells to take on any value between a specified range.
So for example, zero to a hundred percent or if you're talking about elevation, it might be negative a hundred to a hundred. By contrast, raster data can also be discreet, which means each value in the grid can only take on one of a specified list of values. This is sometimes also known as categorical values.
So for example, low, medium, or high, or desert, marshland, and forest would be examples of discreet raster data or categorical data. So what are some common GIS data formats? The first one is a shape file which is a vector data format that was developed by Esri.
Shapefiles are most often associated with Arc GIS but they can be used by many other programs including the open-source Q GIS. In addition, there are many tools online that allow you to convert shape files into other GIS formats and vice versa.
So things like points, lines, polygons, those are considered features, and then when you group them together, that's called a feature collection. Another GIS data format is KML or KMZ files which are specific to Google earth. It's their own geographic markdown language that is only read by Google Earth, but again, there's lots of tools that you can use to convert KML files to GeoJSON or shapefiles or vice versa.
This is not by any means, an exhaustive list of GIS data formats but it's a good flavor and these three are pretty popular data types that you'll see pretty often. So now that you know all about geospatial data and you're familiar with a few types of GIS data formats, where can you find some of this data so you can start working with it? Well, the good news is there are many open source repositories of GIS data on the web.
These data sets can be used as base maps for any of your geospatial visualizations. So map visualizations often consist of several layers of data placed one on top of another. The base map acts as the bottom layer with recognizable geospatial features that help give context to your data.
For example, base map might consist of political borders, roads, or points of interest. Alternatively, a base map might be satellite imagery. So, let's say you wanna map points of tourists interest in New York city. It would be really helpful to have a base map that includes the outline of Manhattan. So it gives you some context in terms of where you are on the map.
And another example is if you wanna map the track of a hurricane. You could use a base map that shows the coastline including state borders within the US. You would then overlay a layer that has the hurricane track data on top of the base map to make your final map visualization.
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.