Deploying a Model on AI Platform


Training Your First Neural Network
12m 40s
Improving Accuracy
8m 4s
Start course
1h 3m

Machine learning is a hot topic these days and Google has been one of the biggest newsmakers. Google’s machine learning is being used behind the scenes every day by millions of people. When you search for an image on the web or use Google Translate on foreign language text or use voice dictation on your Android phone, you’re using machine learning. Now Google has launched AI Platform to give its customers the power to train their own neural networks.

This is a hands-on course where you can follow along with the demos using your own Google Cloud account or a trial account.

Learning Objectives

  • Describe how an artificial neural network functions
  • Run a simple TensorFlow program
  • Train a model using a distributed cluster on AI Platform
  • Increase prediction accuracy using feature engineering and hyperparameter tuning
  • Deploy a trained model on AI Platform to make predictions with new data



  • December 20, 2020: Completely revamped the course due to Google AI Platform replacing Cloud ML Engine and the release of TensorFlow 2.
  • Nov. 16, 2018: Updated 90% of the lessons due to major changes in TensorFlow and Google Cloud ML Engine. All of the demos and code walkthroughs were completely redone.

All of our examples so far have only shown how to train a model, not how to save it and use it to make predictions when new data comes in. Since that’s usually why you want to train a model in the first place, it’s time we covered how to do that.

To keep things simple, I’m going to use the iris model again, so go back to the iris directory. As you’ll recall, there was a line of code at the end of the training script that saved the model. Now we need to deploy that saved model as a service.

First, you have to create a model in AI Platform. Wait, what? Didn’t we just create a model in the last lesson? Yes, but now we need to create an AI Platform model, which is different. In AI Platform, a model is a resource where you put different versions of a trained model. Use the “gcloud ai-platform models create” command. Let’s call the model “iris”. Then add the “--regions” flag.

Now we have to create what AI Platform calls a version of the model. To do that, use the “gcloud ai-platform versions create” command. Let’s call it “v1”. We want this version to be created in the iris model. Then specify the runtime version of AI Platform to use when deploying this model version. Next, add the region flag, and set it to “global”. Now, we have to tell it where to find the saved TensorFlow model, which is in the Cloud Storage bucket.

When it’s done, you can ask it for a prediction by using the “gcloud ai-platform predict” command. Thankfully, it doesn’t require as many arguments as the training command. You only need to tell it four things: the model name, the model version, the region, and the pathname of the data file that contains the new data we want to get a prediction for. There are a few different formats you can use for the data, but Google recommends you use a JSON request. I created a file called test.json in the iris directory that has the data for one flower in it. This was the first flower that we ran through the trained model at the end of the iris script. The model should predict that this flower is an iris setosa.

It came back very quickly, but the output is a little cryptic. It shows the scores for the three classes of irises. The first one has the highest number, so the model is predicting that this flower is an iris setosa, which is correct. You would normally call this prediction service from an application that would understand these scores and convert them into a usable prediction so you wouldn’t have to interpret the results.

This form of model deployment is known as online prediction. The great thing about it is that it returns its predictions very quickly, so it’s typically used by applications that need a real-time response. For example, a website that makes product recommendations would need a fast response so there isn’t a delay in rendering the webpage.

AI Platform also supports batch prediction. Running it is similar to running a training job. The command is “gcloud ai-platform jobs submit prediction”. Batch prediction is optimized for big jobs and it takes longer to start up. Another difference is that predictions are written to files in Cloud Storage rather than sent as a response to the requestor.

A more subtle difference between the two methods is cost. If you request a large number of predictions using the batch method, then the total processing time will likely be much smaller than if you request the same number of predictions using the online method. This is because online prediction requests are typically spread out over a longer period of time. Also, the online prediction service keeps your model in a ready state for a few minutes after servicing a request, and this counts toward your total processing hours, even though no predictions were processed during that time.

And that’s it for this lesson.

About the Author
Learning Paths

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).