1. Home
  2. Training Library
  3. Google Cloud Platform
  4. Courses
  5. Google Cloud Platform: Fundamentals

Big Data & Machine Learning

The course is part of these learning paths

Google Cloud Platform for System Administrators
course-steps 3 certification 1 quiz-steps 3
Google Cloud Platform for Solution Architects
course-steps 3 certification 1 lab-steps 1 quiz-steps 3
Google Cloud Platform Fundamentals
course-steps 2 certification 1 quiz-steps 2
Introduction to the TOP Public Cloud Platforms
course-steps 3 certification 1
Google Cloud Platform for Developers
course-steps 2 certification 1 lab-steps 1 quiz-steps 2
more_horiz See 3 more
Start course
Duration1h 46m


Google Cloud Platform: Fundamentals

If you’re going to work with modern software systems, then you can escape learning about cloud technologies. And that’s a rather broad umbrella. Across the three major cloud platform providers, we have a lot of different service options, and there’s a lot value in them all.

However, the area that I think Google Cloud Platform excels in is providing elastic fully managed services. Google Cloud Platform to me, is the optimal cloud platform for developers. It provides so many services for building out highly available - highly scalable web applications and mobile back-ends.

For me personally, Google Cloud Platform has quickly become my personal favorite cloud platform. Now, opinions are subject, but I’ll share why I like it so much.

I’ve worked as a developer for years, and for much of that time I was responsible for getting my code into production environments and keeping it running. I worked on a lot of smaller teams where there were no operations engineers.

So, here’s what I like about the Google Cloud Platform, it allows me to think about the code and the features I need to develop, without worrying about the operations side. Because many of the service offerings are fully managed.

So things such as App Engine allow me to write my code, test it locally, run it through the CI/CD pipeline, and then deploy it. And once it’s deployed, for the most part, unless I’ve introduced some software bug, I don’t have to think about it. Google’s engineers keep it up-and-running, and highly available. And having Google as your ops team, is really cool!

Another thing I really like about is the ease of use of things such as BigQuery and their Machine Learning APIs. If you’ve ever worked with large datasets, you know that some queries take forever to run. BigQuery can query massive datasets in just seconds. Which allows me to get the data I need quickly, so I can move on to other things.

And with the machine learning APIs I can use a REST interface to do things like language translation, or speech to text, with ease. And that allows me the ability to integrate this into my applications, which gives the end-users a better user experience.

So for me personally, I love that I can focus on building out applications; and spend my time adding value to the end-users.

If you’re looking to learn the fundamentals about a platform that’s not only developer friendly, but cost friendly, then this is the right course for you!

Course Objectives

By the end of this course, you'll know:

  • The purpose and value of each products and services
  • How to choose an appropriate deployment environment
  • How to deploy an application to App Engine, Container Engine, and Compute Engine
  • The different storage options
  • The value of cloud Datastore
  • How to get started with BigQuery

Intended Audience

This is a intermediate level course because it assumes:

  • You have at least a basic understanding of the cloud
  • You’re at least familiar with building and deploying code

What You'll Learn

SummaryA review of the course

Lecture What you'll learn
Intro What will be covered in this course
Introducing Google Cloud Platform An introduction to the Google Cloud Platform
Getting Started A review of projects and permissions.
App Engine and Cloud Datastore An intro to the PaaS option for building web apps and the NoSQL database that works so well with App Engine.
Cloud Storage Options What options exist for data storage?
Container Engine How do we run Docker containers in the cloud?
Compute Engine The IaaS option on Google Cloud.
Big Data and Machine Learning. What options exist for data processing and machine learning



Welcome back to Google Cloud Platform Fundamentals. I'm Ben Lambert, and I'll be your instructor for this lesson.

In this lesson, we'll talk about what services Google offers for big data and machine learning. We'll talk about BigQuery, Cloud Pub/Sub, Cloud Dataflow, Cloud Dataproc, and Cloud Datalab. And for machine learning, we'll cover Cloud Machine Learning, the Vision API, the Speech API, and the Translate API.

There are plenty of companies out there that know big data and machine learning, and Google ranks towards the top of that list. These are the things that Google has been using as part of their core business for some time. And, through the Google Cloud platform, they've given us the ability to use the same tools that they do. The big data services are designed to scale, the same way that Google's internal services do, which means, you don't need to worry about traffic spikes causing problems, the services are designed to be elastic. These services are fully managed so they don't require any effort from our operations teams. BigQuery is an analytics database that allows you to stream data at about 100,000 rows per second. Pub/Sub is a scalable and flexible enterprise messaging queue. Dataflow allows you to perform stream and batch processing, and Dataproc is a managed Hadoop, MapReduce, Spark, Pig, and Hive service.

Let's dive into each of these just a bit more. BigQuery is a fully managed, petabyte scale, low cost analytics data warehouse. You can use a familiar SQL-like syntax to query the database, making the learning curve smaller, since SQL is a familiar language. Let's check it out. We can query some publicly-available datasets. Let's use the github data and see how many projects per language there are. So, we're just gonna write out some SQL. And now let's let this run and see what happens. Now let's change the order. And we'll switch it to the amount field. Look how fast that runs. Let's run another query against the Wikipedia data. Let's check and see how many of the titles have the word Nintendo in it. Nice. So we have quite a few records returned. And now if we change it to cloud. Okay, we get a massive amount of data back in just a few seconds. So BigQuery makes querying massive data sets very simple, so this is a very cool thing.

Next up, we have Cloud Pub/Sub, which is a fully-managed, real-time messaging service, that allows you to send and receive messages between independent applications. You can use Pub/Sub to de-couple systems and components hosted on Google Cloud Platform or elsewhere. By building on the same technology that Google uses, Pub/Sub is designed to provide at least once delivery with low latency, and on-demand scaling for up to one million messages per second. Pub/Sub allows you to subscribe to and to publish messages. For example you could publish a message that contains the IDs of objects that need to be invalidated in a distributed cache, and then you could have some code that subscribes to those messages, and actually executes to refresh the cache for those IDs. Pub/Sub is one of those tools that's really multi-purpose. I like to use it to de-couple services. For example, if I have a website that accepts image uploads, I'll publish the location of the file on cloud storage to a re-size topic, and then my re-size service can pick it up and process it, and then I can publish a message to a thumbnail topic, and generate a thumbnail, and finally, I could publish a message to a completion topic, and the notification service can inform the user that their image is done being processed. So, you can use it for different things, but it is a very fast, highly available messaging option, and it's a really good service to know, so I suggest you test that out.

Next up we have Cloud Dataflow, which is a service that allows you to create data pipelines. Dataflow provides a programming model for both batch and streaming data processing pipelines. It'll allow you to create ETLs, batch computation and continuous computation pipelines. It integrates with Cloud Storage, Cloud Pub/Sub, BigQuery, and Bigtable, and has SDKs for Java and Python. Next we have Dataproc, which is a managed way to run Hadoop, and Spark, Hive, and Pig on the Google Cloud Platform. Dataproc will allow you to quickly create clusters that are billed by the minute, and will scale up and down as needed. With Dataproc you can easily migrate on-premises Hadoop jobs to the cloud. Next we have Datalab, which is an interactive tool that allows you to explore, transform, analyze and visualize your data. It's currently in beta, though that shouldn't stop you from checking it out if you require big data processing. You can use Python, SQL, and Javascript to interact with your data. Datalab is built on Jupiter, and is deployed as an app-engine application. Check it out, we can load a notebook and run some code inside of it. And then we can save it and share it as needed. And, we can interact with BigQuery, allowing us to get the most from Datalab. So, between BigQuery, Pub/Sub, Dataflow, Dataproc, and Datalab, we have quite the set of services for big data.

Next up, let's talk about some services that are available for machine learning on Google Cloud Platform. As of this recording, we have a few options, and they include the Cloud Machine Learning Platform, the Vision API, Speech API, and Translate API. Let's go through these now, and we're gonna start with the Machine Learning Platform. The Machine Learning Platform is currently in alpha, which means it's not available to everyone just yet. However, it will allow you to use your own data to create fully trained machine learning models. It provides you with a fully managed machine learning platform that integrates with other Google Cloud Platform services such as BigQuery and Cloud Storage. Some of the possible use cases include customer churn analysis, content personalization, fraud detection, identifying styles in images, language identification, and more.

Next up let's talk about the Vision API, which is a pre-trained model for analyzing images. Now you can do things like facial detection, logo detection, label detection, et cetera. It quickly classifies images into thousands of categories, and it can find and read printed words contained within images. You can analyze images uploaded in the request, or integrate it with the image stored in Google Cloud Storage. Let's actually check it out. I have a picture that I have uploaded. It's a picture of an espresso-based beverage, and if we use the API explorer, we can test it out and see if the Vision API can recognize it. Okay. Let's run this. And it looks like it's done a pretty good job. It has a few guesses of cappuccino, latte, coffee, et cetera. So I think it's done a nice job at identifying that this is a coffee-based beverage. There's a lot more power here baked in that we haven't shown, so check it out if you need to work with image recognition.

Next up let's talk about the Speech API, which will allow you to convert audio to text by applying powerful neural network models in an easy-to-use API. Currently it recognizes over 80 languages and variants. You can transcribe the text of users dictating to an application's microphone, enable command to control through voice, or transcribe audio files among other use cases. Let's use the API explorer and see this in practice. We'll try and translate a simple audio file. And the result is in. And it says, how old is the Brooklyn Bridge? So that's really cool, this is a really easy-to-use API. We can pass in some file, and have it actually return the text with a guess at how accurate it thinks it is. Next up we have the Translation API, which is an API for translating an arbitrary string into any supported language. The Translate API is a highly responsive API, so websites and applications can integrate with Translate for very fast, dynamic translation of source text. It also supports language detection, for those cases where you don't know what the source language is. Let's check out the API explorer and test this out. Let's try and have it detect some English text. I'll type in hello world. And it has a guess of English, though it doesn't seem convinced that that's thoroughly accurate. Just adjust the text again, see if we can't make it more accurate. Okay, it's not, but still, it's able to determine this is English. Let's try something else. Let's do some translation. Let's set the string to hello world again. And we'll set an output language. And a source language, and there we have it. And if we change the language we can re-run this, and see that it translates again without issue. These machine learning APIs are the same ones that Google uses for its apps. If you've ever used the voice search functionality in the Google app, or the image detection in Google Photos, then you've already interacted with some of these machine learning APIs. In our next lesson, we'll wrap up with a summary of what we've covered throughout this course. So, if you're ready, let's get started.

About the Author

Learning paths16

Ben Lambert is the Director of Engineering and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps.

When he’s not building the first platform to run and measure enterprise transformation initiatives at Cloud Academy, he’s hiking, camping, or creating video games.