1. Home
  2. Training Library
  3. Google Cloud Platform
  4. Courses
  5. Introduction to Google Cloud Machine Learning Engine

Training a Model with ML Engine


Training Your First Neural Network
Improving Accuracy

The course is part of these learning paths

Start course


Machine learning is a hot topic these days and Google has been one of the biggest newsmakers. Recently, Google’s AlphaGo program beat the world’s No. 1 ranked Go player. That’s impressive, but Google’s machine learning is being used behind the scenes every day by millions of people. When you search for an image on the web or use Google Translate on foreign language text or use voice dictation on your Android phone, you’re using machine learning. Now Google has launched Cloud Machine Learning Engine to give its customers the power to train their own neural networks.

If you look in Google’s documentation for Cloud Machine Learning Engine, you’ll find a Getting Started guide. It gives a walkthrough of the various things you can do with ML Engine, but it says that you should already have experience with machine learning and TensorFlow first. Those are two very advanced subjects, which normally take a long time to learn, but I’m going to give you enough of an overview that you’ll be able to train and deploy machine learning models using ML Engine.

This is a hands-on course where you can follow along with the demos using your own Google Cloud account or a trial account.

Learning Objectives

  • Describe how an artificial neural network functions
  • Run a simple TensorFlow program
  • Train a model using a distributed cluster on Cloud ML Engine
  • Increase prediction accuracy using feature engineering and both wide and deep networks
  • Deploy a trained model on Cloud ML Engine to make predictions with new data



  • Nov. 16, 2018: Updated 90% of the lessons due to major changes in TensorFlow and Google Cloud ML Engine. All of the demos and code walkthroughs were completely redone.


Now that you have some experience with TensorFlow scripts, it’s time to see how to run one in Cloud ML Engine.


If you haven’t already installed the Google Cloud SDK on your computer, then do that first. The installation instructions are at https://cloud.google.com/sdk. You’ll probably need to do that outside of the virtual Python environment, though, so it would be best to do it in another terminal.


To run a TensorFlow program in ML Engine, it has to be in a Python package rather than just an individual script file. Fortunately, it’s very easy to turn it into a package. All you have to do is create a file called “__init__.py” in the directory where your script resides. You don’t need to put anything in the file, but it needs to be there. I’ve included that file in this directory, so you don’t need to create it yourself. OK, now you have to be in the parent directory to run it, so go to the base directory if you’re not there already.


The command to run it in ML Engine is “gcloud ml-engine”. Before running the TensorFlow program in the cloud, we should test whether or not it will work with ML Engine first. The way to do that is to put “local” after “gcloud ml-engine”. This runs your Python module locally on your own computer, but in an environment similar to the one it would run in if you were to run it in the Google Cloud. You won’t be charged for anything you run locally, so it’s a good way to test your package before submitting a training job to the cloud.


After “local”, type “train” because you’re training a model. Next type “--module-name” and the name of your module, which is the directory name, “trainer”, dot, then the name of your script, but without the “.py” extension at the end, so just “iris”. Then “--package-path” and the path of the directory. Since the “trainer” directory is in the current directory, you can just say “trainer”, but if you were somewhere else, then you’d need to put in the full pathname.


It should take about 10 seconds, depending on the speed of your computer. OK, we got the same result as before. Now, to run it in the cloud, first you need to have a Cloud Storage bucket so it has a place to upload your package. If you don’t already have one that you can use, then you might want to create one that starts with your project ID, which you can find on the main Google Cloud console page. Copy it so you can paste it later.


Now go to the Cloud Storage console and click “Create Bucket”. Paste your project ID in the Name field and then add “-ml” for “machine learning”. Since Cloud Storage bucket names have to be globally unique across all Google Cloud customers, starting the bucket name with your project ID is a good way to make sure it’s a unique name.


You need to create the bucket in the same region as where you’re going to run your ML Engine jobs. At the moment, ML Engine is only available in 4 regions (us-central1, us-east1, europe-west1, and asia-east1), so create your bucket in whichever region is closest to you.


We’re going to use this bucket in multiple lessons, so to save yourself some typing, create an environment variable that holds the bucket name.


It will also be helpful to set an environment variable to the region where the bucket resides.


Now that you have a bucket, you can submit your job. The command is “gcloud ml-engine jobs submit training”, then a job name. You can call it whatever you want, but you won’t be able to use the same job name again the future. One way to ensure it’s always a unique name is to include a timestamp, but let’s just use a simple name for now, like “iris1”.


Then add the same module-name and package-path arguments as before, then “--staging-bucket $BUCKET”, then “--region” and the name of the region where you created the bucket. If you’ve already set your default region (using the “gcloud config set compute/region” command) to the same region where you created your bucket, then you don’t need to include the --region flag.


This time it will take a lot longer than 10 seconds because it needs to spin up an environment to run your job. It gives you two ways to check up on how your job’s doing. The first one tells you what state the job is in, among other things.


I usually prefer to use the second command, which streams the log entries, so you can always see what’s happening with your job. Right now, it’s waiting for the job to be provisioned, which will take a while, so I’m going to fast forward to when it’s done.


OK, it’s done. If you look at the timestamps of the log entries, you’ll see that it spent the vast majority of the time getting the environment and the job set up, and then it took about 6 seconds to actually run the TensorFlow script. So there’s a lot of overhead when you run an ML Engine job and it can take way longer than running it on your local machine. Not only that, but you have to pay for it too. So why would you run your training jobs in the cloud instead of on your own machine? Because most machine learning jobs take far longer to run than this one, and if you tried to run them on your own computer, it could take days.


In the next lesson, we’ll look at a more complex model.

About the Author

Learning paths49

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).