This course explores geographic information system (GIS) topics and how to query and analyze GIS data within the database environment of BigQuery GIS. We'll start off by defining what GIS is, discussing some key GIS concepts, and introducing some common GIS data types.
We'll then move on to common types of maps and introduce the concept of map projections. You'll get an introduction to Google's BigQuery tool, and finally, we'll put all these topics together and look at how you can perform analysis and visualization using SQL and Python in conjunction with BigQuery GIS.
If you have a use case analyzing and mapping geospatial data or anticipate one in the future, especially if your data is in a relational format (or already in BigQuery), then this course is ideal for you!
Please feel free to contact us at support@cloudacademy.com with any feedback you may have related to this course.
Learning Objectives
- Learn about Google BigQuery GIS and its concepts, as well as common GIS data formats
- Understand the common types of maps and the concept of map projections
- Learn about spatial query functionality and spatial visualization
- Understand how BigQuery GIS can be integrated with Python
Intended Audience
This course is intended for anyone who wants to:
- Leverage BigQuery GIS for their geospatial analytics needs
- Learn how to visualize data on maps
Prerequisites
To get the most out of this course, you should have basic familiarity with SQL, Python, and cloud computing, ideally Google Cloud Platform.
So, now that we know all about geospatial data and maps, let's shift gears and talk about the BigQuery tool. First off, what is BigQuery? BigQuery is a database solution on Google Cloud Platform, also known as GCP. And it's serverless, which means you don't have to set up any infrastructure. You can just open the BigQuery tool and start creating datasets, importing your data into tables with those data sets, which is actually pretty neat.
BigQuery allows users to interact with data using standard ANSI SQL, which is the same SQL queries that you would use if you were using Postgres or MySQL. And users can connect to data in a variety of ways. GCP has a web UI that you can use to directly connect to BigQuery, which is pretty user-friendly and very convenient if you're already in the GCP console. You can also connect BigQuery to any program that uses ODBC or JDBC connections. There's also a command line tool, and finally, you can connect using APIs through programming language such as Python, which we'll talk about more later in this course.
So, what are some applications of BigQuery? There's BigQuery ML which allows you to perform machine learning using SQL data. There's BigQuery BI Engine which allows you to do interactive data analysis. There's Connected Sheets which allows you to view BigQuery data sets in Google Sheets. And what's kind of neat about this is you don't even need to know SQL for this. You can just manipulate data within a spreadsheet, which is great for less technical users. And then there's the focus of this course which is BigQuery GIS which allows you to analyze geospatial data.
One thing that's really nice about BigQuery is that there are a large number of public datasets that you can explore and use to test out BigQuery. This allows you to perform data analysis while you're actually using real data. And this is really neat because there's tons of really interesting data and things that you can use not just to perform full data analysis, but to enhance data analysis that you're performing with your own data.
So, for example, you might want to know what the zip code boundaries are and that's something that's in the public data set, and then you can use that to combine it with data that you have and make something really interesting. So, some examples of the public datasets are U.S. Census Data, Chicago Taxi Rides, and Stack Overflow posts archive. I'm sure you're all familiar with Stack Overflow. And of course, as I already mentioned, zip code data which we'll be using quite a bit in the examples later on.
For more information on the public datasets, you can go to cloud.google.com/publicdatasets, and you can view all the public data sets that are available. You can browse through them. You can see information about each public data set. And there's lots and lots of data there. So I encourage you to explore it.
Now that we understand a bit of what BigQuery is, let's dive right into an example. On this slide, you can see the web UI for BigQuery in the GCP console. So on the left-hand side, you have the navigation which allows you to view the query history, saved queries, job history, and lots of other information about what you've actually been running.
On the bottom of the left-hand side, you have your data sets plus the public data sets that you can dive into and even search if you have lots of data. And you can do the navigation and open up a data set and see what tables are underneath them, click on the table, and that will actually allow you to see the details of that table, including the schema and some sample data.
In addition, from the web UI, you can actually create new data sets and new tables within those data sets. On the right hand side, you can see at the top, we have our query editor which is where you can write and edit SQL queries. And on the bottom, you have the results of the queries that we've run.
In this course, we're assuming that you already are familiar with SQL, and you'll see SQL code throughout the examples. If you're not as comfortable with SQL or maybe it's just been a while, there were plenty of great resources on the web that you can go to brush up on your SQL skills.
The example on this slide is one of the public data sets from Google, which is the U.S. airports table from the FAA dataset. And we just did a select star on this dataset to get all the fields in the table. In the example, we're filtering on New York only. So we're showing all the airports that have the service city of New York. And you can see that it includes both traditional airports, which are labeled as airport-type aerodrome, which sounds kind of British to me, as well as seaports and heliports. And one of the interesting things about the data here is that Newark Airport is not listed, even though as a New Yorker, I would consider Newark an airport that services New York City.
So, the point is that while the public data sets are great, it's often helpful if you have outside information that you can bring in to validate them and make sure that the public data set makes sense and you understand the data that you're working with.
Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.