1. Home
  2. Training Library
  3. Google Cloud Platform
  4. Courses
  5. Introduction to Google Cloud Machine Learning Engine

Deploying a Model on ML Engine


Training Your First Neural Network
Improving Accuracy

The course is part of these learning paths

Google Data Engineer Exam – Professional Certification Preparation
Start course


Machine learning is a hot topic these days and Google has been one of the biggest newsmakers. Recently, Google’s AlphaGo program beat the world’s No. 1 ranked Go player. That’s impressive, but Google’s machine learning is being used behind the scenes every day by millions of people. When you search for an image on the web or use Google Translate on foreign language text or use voice dictation on your Android phone, you’re using machine learning. Now Google has launched Cloud Machine Learning Engine to give its customers the power to train their own neural networks.

If you look in Google’s documentation for Cloud Machine Learning Engine, you’ll find a Getting Started guide. It gives a walkthrough of the various things you can do with ML Engine, but it says that you should already have experience with machine learning and TensorFlow first. Those are two very advanced subjects, which normally take a long time to learn, but I’m going to give you enough of an overview that you’ll be able to train and deploy machine learning models using ML Engine.

This is a hands-on course where you can follow along with the demos using your own Google Cloud account or a trial account.

Learning Objectives

  • Describe how an artificial neural network functions
  • Run a simple TensorFlow program
  • Train a model using a distributed cluster on Cloud ML Engine
  • Increase prediction accuracy using feature engineering and both wide and deep networks
  • Deploy a trained model on Cloud ML Engine to make predictions with new data



  • Nov. 16, 2018: Updated 90% of the lessons due to major changes in TensorFlow and Google Cloud ML Engine. All of the demos and code walkthroughs were completely redone.


All of our examples so far have only shown how to train a model, but not how to save it and use it to make predictions when new data comes in. Since that’s usually why you want to train a model in the first place, it’s time we covered how to do that.


I have good news for you. Normally, it takes a surprising amount of work to save a model, but once again, the tf.estimator library will come to our rescue. You’ll recall that tf.estimator makes it easy to run distributed training jobs. Well, it also makes it easy to save a model.


In the task.py script that we ran in the last lesson, it uses the tf.estimator.FinalExporter class to export, or save, the model. When you use the saved model to make predictions, you need to give it new data to process. This data doesn’t necessarily have to be in the same format as what you used when you trained the model. For example, maybe the training data was in CSV format, but in the future, data will come from an application that sends it in JSON format.


The model script has three different input functions for three different data formats: JSON, CSV, and EXAMPLE. You’re probably familiar with JSON and CSV, but EXAMPLE is a TensorFlow format, which you probably don’t need to use. The script supports a command line argument called “export-format” that allows you to choose the format of the data that the model will accept. The default format is JSON.


Luckily, if you adapt the task and model scripts for your own data, you won’t need to change any of the code that’s related to saving the model.


All of this means that when you ran the training job in the last lesson, it also saved the model, so now you just need to deploy it.


First, you have to create a model in ML Engine. Wait, what? Didn’t we just create a model in the last lesson? Yes, but now we need to create an ML Engine model, which is different. In ML Engine, a model is a container for different versions of a TensorFlow model. Use the “gcloud ml-engine models create” command. Let’s call the model “census”. Then add the “--regions” flag.


Now we have to create what ML Engine calls a version of the model. To do that, first we need to get the pathname of the TensorFlow model we saved, which is in Cloud Storage. Type “gsutil ls -r $BUCKET/census1/export”. OK, we’ll use one of those directories in the next command.


To create an ML model version, use the “gcloud ml-engine versions create” command. Let’s call it “v1”. We want this version to be created in the census model. Then specify the runtime version of ML Engine to use when deploying this model version. Now, we have to tell it where to find the saved TensorFlow model. Copy and paste the URL that ends with a numbered directory. It’ll take a few minutes to create it.


When it’s done, you can ask it for a prediction by using the “gcloud ml-engine predict” command. Thankfully, it doesn’t require nearly as many arguments as the training command in the last lesson. You only need to tell it three things: the model name, the model version, and the pathname of the data file. Since our model is expecting JSON formatted data, we’ll use the “--json-instances” flag. The test.json file is in the census directory, which is the parent directory of estimator. This is assuming that you’re still in the estimator directory, so you need to start the path with “../”.


It came back very quickly, but the output is a little cryptic. It’s predicting that this person is in class 0, which means they make less than $50,000. These are the probabilities that this particular data falls into class 0 or class 1. In other words, it’s extremely confident that this person makes less than $50,000 because it assigned a 95.6% probability to class 0.


This form of model deployment is known as online prediction. The great thing about it is that it returns its predictions very quickly, so it’s typically used by applications that need a real-time response. For example, a website that makes product recommendations would need a fast response so there isn’t a delay in rendering the webpage.


ML Engine also supports batch prediction. Running it is similar to running a training job. The command is “gcloud ml-engine jobs submit prediction”. Batch prediction is optimized for big jobs and it takes longer to start up. Another difference is that predictions are written to files in Cloud Storage rather than sent as a response to the requestor.


A more subtle difference between the two methods is pricing. In both cases, there’s a price for processing, and since the two methods are architected differently, that has an impact on the processing cost.


The cost in the US is 5.6 cents per node hour. A node hour is basically a virtual machine running for an hour. There’s a minimum charge of 10 minutes.


If you request a large number of predictions using the batch method, then the total processing time will likely be much smaller than if you request the same number of predictions using the online method. This is because online prediction requests are typically spread out over a longer period of time. Also, the online prediction service keeps your model in a ready state for a few minutes after servicing a request, and this counts toward your total node hours, even though no predictions were processed during that time.


Regardless of which method you use, it’s a very convenient service at a very reasonable price, in my opinion.


And that’s it for this lesson.

About the Author
Learning paths63

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).