Google Prediction API: a Machine Learning Black Box for Developers

Google Prediction API provides a RESTful interface to build Machine Learning models

This is my third article on how to build Machine Learning models in the Cloud. I previously explored Amazon Machine Learning and Azure Machine Learning – relative newcomers in the cloud data market. Google Prediction API, on the other hand, was released all the way back in 2011 and offers a very stable and simple way to train Machine Learning models via a RESTful interface, although it might seem less friendly if you generally prefer browser interfaces.

I am not going to explore the wide range of services offered by Google Cloud Platform, you can easily check the Developers Console out by yourself for free, sign up for Google Free Trial  ($300 in credit to use for 2 months), and check out Cloud Academy’s courses on Google Cloud Platform Fundamentals.

Google Prediction API: Machine Learning Black Box

We can define Google’s approach as a “black box”, since you get no control over what happens under the hood: your model configuration is restricted to specifying “Classification” vs. “Regression,” or providing a preprocessing PMML (Predictive Model Markup Language) file and a set of weighting parameters in the case of categorical models. That’s it.

Let me clarify a few basic concepts that will help you specifically with Google Prediction API:

  • You need Regression whenever your target output is a numerical continuous variable which may – or may not – span a specific range (i.e. the price of a car, the age of a person, etc).
  • Classification is what you need whenever your target output can assume only a limited set of values, either numbers or strings, based on your application context.
  • Binary Classification is a special case in which your target output can assume only two values (let’s say True or False), for which simpler but more accurate models can be built. In some cases, building a set of binary models and combining their output might perform better than a single multi-class model.

On the other hand, your input features (your columns) can contain any type of data, although certain types are easier to work with (i.e. text analysis is clearly more complex than numerical regression).

The good news is that Google just doesn’t impose any arbitrary constraints on your input data types or require any configuration process. All you need to do is format your dataset the right way. Think of it as a big table, where each row is an input vector and the first column is your target value.

You will need to upload a single CSV file and Google Prediction API will take care of types detection, values normalization, features selection, etc.
Google Prediction API Dataset

Google Prediction API first step: uploading a dataset

The only Google Cloud service we need in order to use the Google Prediction API is Cloud Storage, where we will store our dataset. You will not need to enable it on your Console, since it’s automatically enabled on every Google Cloud project.

First of all, we will create a new Project. You have to choose a name, an ID and, optionally, the data center location.
Google Prediction API - Storage Project

Just as we did for my previous articles on AmazonML and AzureML, we are going to train a model for HAR (Human Activity Recognition) using an open dataset built by UCI and freely available here.

The dataset is composed of more than 10,000 records, each one defined by 560 input features and one target column, which can take one of the following values: 1, 2, 3, 4, 5 and 6 (walking, walking upstairs, walking downstairs, sitting, standing, laying down).

Every record has been generated on a smartphone, using accelerometer and gyroscope data, and labelled manually based on the performer’s activity.

We are going to build a multi-class model to understand whether a given record of sensor data (generated in real time) can be definitively associated with walking, standing, sitting, laying, etc. Such a model might be useful for things like activity tracking and healthcare monitoring.

I have already gone through the process of manipulating the original dataset to create one single CSV file, since the original dataset has been split into smaller datasets (for training and testing) and input features have been separated from target values. You can find my Python script here.

The next step involves uploading the dataset file to Google Cloud Storage: you can do this from the Developers Console, clicking on “Storage > Cloud Storage > Storage Browser” on the side menu. Here you want to create a new bucket (i.e. a folder), select it, then upload your file. It will take a while (the file contains about 90MB of data). If you’ve got a slow network connection, you might try to upload only a smaller portion of the dataset. The model accuracy should be pretty good with only 20% of our Ground Truth.
Google Prediction API -Machine Learning Dataset

How to use Google Prediction API

Unfortunately, Google Prediction API doesn’t provide any user-friendly Web interface, and almost every step beyond this point will be performed using Python scripts via an API call. If you really can’t stand coding, you might use the official APIs Explorer, but that’s not how you want to build your products, right?
Google Prediction API Explorer

Before making real API calls, we need to enable Google Prediction API on our project. You will find it by clicking on “APIs & auth > APIs”: it will be the last but one item of the Google Cloud APIs list. Enabling an API is quite straightforward and you need to do it only once for each of your projects. A single click on “Enable API” will do the job.
Google Prediction API list
One last step: you need to create a new oAuth2 Client ID. Most Google APIs use oAuth2 for authentication: you can either create a Service account key (for server to server applications) or a Web Application Client. In our case, since we don’t need to work with our users’ data, we are going to use a server to server key.
Eventually, you might use a WebApp Client ID even for a server to server application, but your code will end up being slightly more complicated and you will need to go through the typical oAuth flow (either using your browser or copying and pasting oAuth codes on your terminal).

Let’s proceed. Click on “APIs & auth > Credentials” and “Create new Client ID”. Select “Service account” in the popup and confirm the creation. You will automatically download a JSON file containing the new Client ID data, including ‘client_email’ and ‘private_key’: we will open this file and use these two fields in our code.
Google Prediction API oAuth2

Google Prediction API: Model Training Phase

We are finally ready to use Google Prediction API. I am going to work with their excellent official Python client – even if the documentation might sometimes be misleading. Also, I am going to show small code segments to focus on each sub-task, but you can find the whole script here.

As you can see from the official documentation, you can either use a Hosted Model or train your own ones. In order to train a new model, we are going to use the insert API method.

Every Google Prediction API method takes your project ID as first parameter. The Trainedmodels.Insert method expects a body parameter, containing a model ID (that you choose), your model type (classification or regression), and your dataset (either a Cloud Storage location or a set of instances).

#train a new classification model
api.trainedmodels().insert(project=project_id, body={
    'id': model_id,
    'storageDataLocation': 'machine-learning-dataset/dataset.csv',
    'modelType': 'CLASSIFICATION'
}).execute()

Optionally, you can specify the following parameters:

  • sourceModel: the ID of an existing model, in case you want to clone it.
  • storagePMMLLocation: a preprocessing file (PMML format).
  • utility: a weighting function for categorical models.

Of course, the training phase is asynchronous and you will need to check your model status using the Trainedmodels.get method. As long as your model’s trainingStatus property is not “DONE”, you won’t be able to use it.

Is your model good enough?

Now you can start generating new predictions, but you might want to first analyze it to see what kind of accuracy you can expect. You can call the Trainedmodels.analyze method and be given a lot of useful information about your model.

#retrieve the new model's analysis
analysis = api.trainedmodels().analyze(project=project_id, id=model_id).execute()

This API call returns insights about your dataset (dataDescription), providing three numerical statistics for each input feature: count, mean and variance. These might be useful, but aren’t anything special. After all, you could have computed them by yourself without creating a new model.

What we really need is the modelDescription field. Indeed, it contains a confusionMatrix structure. Although it’s not that easy to read in a JSON format, this structure will tell you how your model behaves.

In order to process it, Google had to split your dataset into two smaller sets. The first one was used to train the model, and the second one to evaluate it. If you do the math, based on the values and on the dataset size, you will notice that Google applied a 90/10 split.

{
    'confusionMatrix': {
        '1': {
            '1': '166.00',
            '2': '0.00',
            '3': '1.00',
            '4': '0.00',
            '5': '0.00',
            '6': '0.00'
        },
        '2': {
            '1': '0.00',
            '2': '164.00',
            '3': '0.00',
            '4': '0.00',
            '5': '0.00',
            '6': '0.00'
        },
        '3': {
            '1': '0.00',
            '2': '2.00',
            '3': '158.00',
            '4': '0.00',
            '5': '0.00',
            '6': '0.00'
        },
        '4': {
            '1': '0.00',
            '2': '0.00',
            '3': '0.00',
            '4': '161.00',
            '5': '5.00',
            '6': '1.00'
        },
        '5': {
            '1': '0.00',
            '2': '0.00',
            '3': '0.00',
            '4': '9.00',
            '5': '180.00',
            '6': '0.00'
        },
        '6': {
            '1': '0.00',
            '2': '0.00',
            '3': '0.00',
            '4': '0.00',
            '5': '0.00',
            '6': '196.00'
        }
    },
    'confusionMatrixRowTotals': {
        '1': '167.00',
        '2': '164.00',
        '3': '160.00',
        '4': '167.00',
        '5': '189.00',
        '6': '196.00'
    },
    'modelinfo': {
        'kind': 'prediction#training'
    }
}

I admit that some percentage of values would have been easier to read, and maybe some precision/recall statistics would also have been nice. You can always compute those by yourself but let’s say that – very intuitively – you should see a lot of zeros around, and higher numbers on the main diagonal. Every non-zero value outside of the main diagonal means that your model wrongly classified N of your records.

With this dataset and the applied data split, here is how the Confusion Matrix looks:

C. Matrix Class 1 Class 2 Class 3 Class 4 Class 5 Class 6
Class 1 99.40% 0 0.60% 0 0 0
Class 2 0 100.00% 0 0 0 0
Class 3 0 1.25% 98.75% 0 0 0
Class 4 0 0 0 96.40% 3.00% 0.60%
Class 5 0 0 0 4.80% 95.20% 0
Class 6 0 0 0 0 0 100.00%

It is not too bad, considering the required effort and the total absence of configuration and data normalization. Google has successfully created a reliable model and we can now generate new predictions, based on new unlabelled data.

How to generate new Predictions

In order to simplify this demo, I am assuming that we have already computed every input feature on our smartphone, sent it to our server, and stored it into a local CSV file.

Therefore I am just reading the file and calling Trainedmodels.predict, which takes a csvInstance input, in the form of a simple list of values.

#activities dict
labels = {
	'1': 'walking', '2': 'walking upstairs', '3': 'walking downstairs',
	'4': 'sitting', '5': 'standing', '6': 'laying'
}
#read new record from local file
with open('record.csv') as f:
	record_str = f.readline()
#obtain new prediction
prediction = api.trainedmodels().predict(project=project_id, id=model_id, body={
	'input': {
		'csvInstance': record_str.split(',')
	},
}).execute()
#retrieve classified label and reliability measures
label = prediction.get('outputLabel')
stats = prediction.get('outputMulti')
#show results
print("You are currently %s (class %s)." % (labels[label], label) )
print(stats)

This API call is pretty fast (with respect to other ML services) and will return the following:

  • outputLabel: the predicted class (in our case the classified activity).
  • outputMulti: a list of reliability measure for each class.

If your model average accuracy is high enough, you can just take outputLabel as your prediction result. In case you can’t blindly trust your model – or if you can make more advanced decisions based on your application context – you may want to inspect outputMulti and take your final decision based on each class’ reliability measure.

Google Prediction API: what’s next?

I believe Google’s black box reached a pretty high level of abstraction for developers, although a more flexible dataset configuration and better analysis visualization would make the product easier to use for everyone, especially non-coders.
One nice feature is that you can always keep your model updated, adding new data on the fly, without going through the whole training phase again. This is especially nice for systems that span long periods of time so that you can easily adapt your model to new data and conditions, without the need for a new modeling phase.

As far as speed and performance, Google Prediction API seems like a great candidate for your real-time predictions. With respect to other ML services – and with this open dataset – it achieved the highest accuracy, taking only a couple of minutes for training and an average response time below 1.3 seconds for real-time predictions.

Avatar

Written by

Alex Casalboni

Alex is a Software Engineer with a great passion for music and web technologies. He's experienced in web development and software design, with a particular focus on frontend and UX.


Related Posts

Joe Nemer
Joe Nemer
— September 15, 2020

New Content: Azure DP-100 Certification, Alibaba Cloud Certified Associate Prep, 13 Security Labs, and Much More

This past month our Content Team served up a heaping spoonful of new and updated content. Not only did our experts release the brand new Azure DP-100 Certification Learning Path, but they also created 18 new hands-on labs — and so much more! New content on Cloud Academy At any time, y...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Avatar
Andrew Larkin
— August 18, 2020

Constant Content: Cloud Academy’s Q3 2020 Roadmap

Hello —  Andy Larkin here, VP of Content at Cloud Academy. I am pleased to release our roadmap for the next three months of 2020 — August through October. Let me walk you through the content we have planned for you and how this content can help you gain skills, get certified, and...

Read more
  • alibaba
  • AWS
  • Azure
  • content roadmap
  • Content updates
  • DevOps
  • GCP
  • Google Cloud
  • New content
Alisha Reyes
Alisha Reyes
— August 5, 2020

New Content: Alibaba, Azure AZ-303 and AZ-304, Site Reliability Engineering (SRE) Foundation, Python 3 Programming, 16 Hands-on Labs, and Much More

This month our Content Team did an amazing job at publishing and updating a ton of new content. Not only did our experts release the brand new AZ-303 and AZ-304 Certification Learning Paths, but they also created 16 new hands-on labs — and so much more! New content on Cloud Academy At...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Alisha Reyes
Alisha Reyes
— July 16, 2020

Blog Digest: Which Certifications Should I Get?, The 12 Microsoft Azure Certifications, 6 Ways to Prevent a Data Breach, and More

This month, we were excited to announce that Cloud Academy was recognized in the G2 Summer 2020 reports! These reports highlight the top-rated solutions in the industry, as chosen by the source that matters most: customers. We're grateful to have been nominated as a High Performer in se...

Read more
  • AWS
  • Azure
  • blog digest
  • Certifications
  • Cloud Academy
  • OWASP
  • OWASP Top 10
  • Security
  • VPCs
Avatar
Cloud Academy Team
— July 9, 2020

Which Certifications Should I Get?

The old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and companies. With all that in mind, the s...

Read more
  • AWS
  • Azure
  • Certifications
  • Cloud Computing
  • Google Cloud Platform
Alisha Reyes
Alisha Reyes
— July 2, 2020

New Content: AWS, Azure, Typescript, Java, Docker, 13 New Labs, and Much More

This month, our Content Team released a whopping 13 new labs in real cloud environments! If you haven't tried out our labs, you might not understand why we think that number is so impressive. Our labs are not “simulated” experiences — they are real cloud environments using accounts on A...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Joe Nemer
Joe Nemer
— June 19, 2020

Kickstart Your Tech Training With a Free Week on Cloud Academy

Are you looking to make a jump in your technical career? Want to get trained or certified on AWS, Azure, Google Cloud Platform, DevOps, Kubernetes, Python, or another in-demand skill? Then you'll want to mark your calendar. Starting Monday, June 22 at 12:00 a.m. PDT (3:00 a.m. EDT), ...

Read more
  • AWS
  • Azure
  • cloud academy content
  • complimentary access
  • GCP
  • on the house
Joe Nemer
Joe Nemer
— June 12, 2020

Azure Certifications: Our Experts Explain Which Is Best for You

How do you choose an Azure certification? It can be hard to get started when choosing an Azure certification. There are so many to sift through, so many interesting options, and it requires a time commitment to just understand the cert landscape. To help guide you through the select...

Read more
  • AZ-900
  • Azure
  • Certifications
Alisha Reyes
Alisha Reyes
— June 11, 2020

New Content: AZ-500 and AZ-400 Updates, 3 Google Professional Exam Preps, Practical ML Learning Path, C# Programming, and More

This month, our Content Team released tons of new content and labs in real cloud environments. Not only that, but we introduced our very first highly interactive "Office Hours" webinar. This webinar, Acing the AWS Solutions Architect Associate Certification, started with a quick overvie...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Rebecca Willis
Rebecca Willis
— June 3, 2020

Azure vs. AWS: Which Certification Provides the Brighter Future?

More and more companies are using cloud services, prompting more and more people to switch their current IT position to something cloud-related. The problem is most people only have that much time after work to learn new technologies, and there are plenty of cloud services that you can ...

Read more
  • AWS
  • Azure
  • certification
Alisha Reyes
Alisha Reyes
— June 2, 2020

Blog Digest: 5 Reasons to Get AWS Certified, OWASP Top 10, Getting Started with VPCs, Top 10 Soft Skills, and More

Thank you for being a valued member of our community! We recently sent out a short survey to understand what type of content you would like us to add to Cloud Academy, and we want to thank everyone who gave us their input. If you would like to complete the survey, it's not too late. It ...

Read more
  • AWS
  • Azure
  • blog digest
  • Certifications
  • Cloud Academy
  • OWASP
  • OWASP Top 10
  • Security
  • VPCs
Alisha Reyes
Alisha Reyes
— May 11, 2020

New Content: Alibaba, Azure Cert Prep: AI-100, AZ-104, AZ-204 & AZ-400, Amazon Athena Playground, Google Cloud Developer Challenge, and much more

This month, our Content Team released 8 new learning paths, 4 courses, 7 labs in real cloud environments, and 4 new knowledge check assessments. Not only that, but we introduced our very first course on Alibaba Cloud, and our expert instructors are working 'round the clock to create 6 n...

Read more
  • alibaba
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming