Google Vision API: Image Analysis as a Service

Build powerful applications that see and understand the content of images with the Google Vision API

The Google Vision API was released last month, on December 2nd 2015, and it’s still in limited preview. You can request access to this limited preview program here and you should receive a very quick email follow-up.
I recently requested access with my personal Google Cloud Platform account in order to understand what types of analysis are supported. This also allows me to perform some tests.
Google Vision API Face Detection (Lena)

Image analysis and features detection

The Google Vision API provides a  RESTful interface that quickly analyses image content. This interface hides the complexity of continuously evolving machine learning models and image processing algorithms.
These models will improve overall system accuracy – especially as far as object detection – since new concepts almost certainly will be introduced in the system over time.
In more detail, the API lets you annotate images with the six following features.

  1. LABEL_DETECTION: executes Image Content Analysis on the entire image and provides relevant labels (i.e. keywords & categories).
  2. TEXT_DETECTION: performs Optical Character Recognition (OCR) and provides the extracted text, if any.
  3. FACE_DETECTION: detects faces, provides facial key points, main orientation, emotional likelihood, and the like.
  4. LANDMARK_DETECTION: detects geographic landmarks.
  5. LOGO_DETECTION: detects company logos.
  6. SAFE_SEARCH_DETECTION: determines image safe search properties on the image (i.e. the likelihood that an image might contain violence or nudity).

You can annotate all these features at once (i.e. with a single upload), although the API seems to respond slightly faster if you focus on one or two features at a time.

At this time, the API only accepts a series of base64-encoded images as input, but future releases will be integrated with Google Cloud Storage so that API calls won’t require image uploads at all. This will offer substantially faster invocation.

Label Detection – Scenarios and examples

Label detection is definitely the most interesting annotation type. This feature adds semantics to any image or video stream by providing a set of relevant labels (i.e. keywords) for each uploaded image. Labels are selected among thousands of object categories and mapped to the official Google Knowledge Graph. This allows image classification and enhanced semantic analysis, understanding, and reasoning.

Technically, the actual detection is performed on the image as a whole, although an object extraction phase may be executed in advance on the client in order to extract a set of labels for each single object. In this case, each object should be uploaded as an independent image. However, this may lead to lower-quality results if the resolution isn’t high enough, or if the object context is more relevant than the object itself — for the application’s purpose.

So what do label annotations look like?

"labelAnnotations": [
    {
        "score": 0.99989069,
        "mid": "/m/0ds99lh",
        "description": "fun"
    },
    {
        "score": 0.99724227,
        "mid": "/m/02jwqh",
        "description": "vacation"
    },
    {
        "score": 0.63748151,
        "mid": "/m/02n6m5",
        "description": "sun tanning"
    }
]

The API returns something very similar to the JSON structure above for each uploaded image. Each label is basically a string (the description field) and comes with a relevance score (0 to 1) and a Knowledge Graph reference.
You can specify how many labels the API should return at request time (3 in this case) and the labels will be sorted by relevance. I could have asked for 10 labels and then thresholded their relevance score to 0.8 in order to consider only highly relevant labels in my application (in this case only two labels would have been used).

Here is an example of the labels given by the Google Vision API for the corresponding image:
desk
The returned labels are: “desk, room, furniture, conference hall, multimedia, writing.”
The first label – “desk” – had a relevance score of 0.97, while the last one – “writing” – had a score of 0.54.
I have programmatically appended the annotations to the input image (with a simple Python script). You can find more Label Detection examples on this public gist.

Personally, I found the detection accurate on every image I uploaded. In some cases though, no labels were returned at all, and very few labels sounded misleading even with a relevance score above 0.5.

Text Detection – OCR as a Service

Optical Character Recognition is not a new problem in the field of image analysis, but it often requires high-resolution images, very little perspective distortion and an incredibly precise text extraction algorithm. In my personal experience, the character classification step is actually the easiest one, and there are plenty of techniques and benchmarks in the literature.

In the case of the Google Vision API, everything is encapsulated in a REStful API that simply returns a string and its bounding box. As I would have expected from Google, the API is able to recognize multiple languages, and will return the detected locale together with the extracted text.
Here is an example of a perfect extraction:
Google Vision API - Text Detection (OCR)
The API response looks very similar to the following JSON structure:

"textAnnotations": [
    {
        "locale": "en",
        "description": "Sometimes\nwhen I Am\nAlone I Google\nMyself\n",
        "boundingPoly": {
            "vertices": [
                {"y": 208, "x": 184},
                {"y": 208, "x": 326},
                {"y": 314, "x": 326},
                {"y": 314, "x": 184}
            ]
        }
    }
]

Only one bounding box is detected, in the English language, and the textual content is even split by break lines.

I have been running a few more examples in which much more text was detected in different areas of the image, but it was all collapsed into a single (and big) bounding box, where each text was separated by a break line. This doesn’t make the task of extracting useful information easy, and the Google team is already gathering feedback about this on the official limited preview Google Group (of which I’m proud to be a part).

What about handwritten text and CAPTCHAs?

Apparently, the quality is not optimal for handwritten text, although I believe it’s more than adequate for qualitative analysis or generic tasks, such as document classification.
Here is an example:
Google Vision API - OCR handwritten
With the corresponding extracted text:

water ran down
her hair and clothes it ran
down into the toes Lof her
shoes and out again at the
heels. And het she said
that
Mwas a real princess

As I mentioned, it’s not perfect, but it would definitely help most of us a lot.
On the other hand, CAPTCHAs recognition is not as easy. It seems that crowdsourcing is still a better option for now. 😉

Face Detection – Position, orientation, and emotions

Face detection aims at localizing human faces inside an image. It’s a well-known problem that can be categorized as a special case of a general object-class detection problem. You can find some interesting data sets here.

I would like to stress two important points:

  • It is NOT the same as Face Recognition, although the detection/localization task can be thought of as one of the first steps in the process of recognizing someone’s face. This typically involves many more techniques, such as facial landmarks extraction, 3D analysis, skin texture analysis, and others.
  • It usually targets human faces only (yes, I have tried primates and dogs with very poor results).

If you ask the Google Vision API to annotate your images with the FACE_DETECTION feature, you will obtain the following:

  • The face position (i.e. bounding boxes);
  • The landmarks positions (i.e. eyes, eyebrows, pupils, nose, mouth, lips ears, chin, etc.), which include more than 30 points;
  • The main face orientation (i.e. roll, pan, and tilt angles);
  • Emotional likelihoods (i.e. joy, sorrow, anger, surprise, etc), plus some additional information (under exposition likelihood, blur likelihood, headwear likelihood, etc.).

Here is an example of face recognition, where I have programmatically rendered the extracted information on the original image. In particular, I am framing each face into the corresponding bounding boxes, rendering each landmark as a red dot and highlighting the main face orientation with a green arrow (click for higher resolution).
Google Vision API - Face Detection
As you can see, every face is correctly detected and well localized. The precision is pretty high even with numerous faces in the same picture, and the orientation is also accurate.

In this example, just looking at the data, we might infer that the picture contains 5 happy people who are most likely facing something or someone around the center of the image. If you complement this analysis with label detection, you would obtain “person” and “team” as most relevant labels, which would give your software a pretty accurate understanding of what is going on.

You can find more face detection example on this public gist.

Alpha testing Conclusions

Although still in limited preview, the API is surprisingly accurate and fast: queries takes just milliseconds to execute. The processing takes longer with larger images, mostly because of the upload time.
I am looking forward to the Google Cloud Storage integration, and to the many improvements already suggested on the Google Group by the many active alpha testers.

I didn’t focus much on the three remaining features yet – geographic landmarks detection, logo detection and safe search detection – but the usage of such features will probably sound straightforward to most of you. Please feel free to reach out or drop a comment if you have doubts or tests suggestions.
You can find all the Python utilities I used for these examples here.

If you are interested in Machine Learning technologies and Google Cloud Platform, you may want to have a look at my previous article about Google Prediction API as well.

Avatar

Written by

Alex Casalboni

Alex is a Software Engineer with a great passion for music and web technologies. He's experienced in web development and software design, with a particular focus on frontend and UX.


Related Posts

Joe Nemer
Joe Nemer
— September 15, 2020

New Content: Azure DP-100 Certification, Alibaba Cloud Certified Associate Prep, 13 Security Labs, and Much More

This past month our Content Team served up a heaping spoonful of new and updated content. Not only did our experts release the brand new Azure DP-100 Certification Learning Path, but they also created 18 new hands-on labs — and so much more! New content on Cloud Academy At any time, y...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Alisha Reyes
Alisha Reyes
— August 5, 2020

New Content: Alibaba, Azure AZ-303 and AZ-304, Site Reliability Engineering (SRE) Foundation, Python 3 Programming, 16 Hands-on Labs, and Much More

This month our Content Team did an amazing job at publishing and updating a ton of new content. Not only did our experts release the brand new AZ-303 and AZ-304 Certification Learning Paths, but they also created 16 new hands-on labs — and so much more! New content on Cloud Academy At...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Avatar
Cloud Academy Team
— July 9, 2020

Which Certifications Should I Get?

The old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and companies. With all that in mind, the s...

Read more
  • AWS
  • Azure
  • Certifications
  • Cloud Computing
  • Google Cloud Platform
Alisha Reyes
Alisha Reyes
— July 2, 2020

New Content: AWS, Azure, Typescript, Java, Docker, 13 New Labs, and Much More

This month, our Content Team released a whopping 13 new labs in real cloud environments! If you haven't tried out our labs, you might not understand why we think that number is so impressive. Our labs are not “simulated” experiences — they are real cloud environments using accounts on A...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Alisha Reyes
Alisha Reyes
— June 11, 2020

New Content: AZ-500 and AZ-400 Updates, 3 Google Professional Exam Preps, Practical ML Learning Path, C# Programming, and More

This month, our Content Team released tons of new content and labs in real cloud environments. Not only that, but we introduced our very first highly interactive "Office Hours" webinar. This webinar, Acing the AWS Solutions Architect Associate Certification, started with a quick overvie...

Read more
  • AWS
  • Azure
  • DevOps
  • Google Cloud Platform
  • Machine Learning
  • programming
Avatar
Thomas Mitchell
— May 26, 2020

Google Cloud Platform: Getting Started with VPCs

When preparing for Google Cloud certifications or Amazon AWS certifications, you will likely encounter the term “virtual private cloud.” A virtual private cloud in Google Cloud Platform and Amazon AWS is referred to as a VPC. In this blog post, we’re going to take a look at some of the ...

Read more
  • GCP
  • Google
  • Google Cloud Platform
  • virtual private cloud
  • VPCs
Alisha Reyes
Alisha Reyes
— May 11, 2020

New Content: Alibaba, Azure Cert Prep: AI-100, AZ-104, AZ-204 & AZ-400, Amazon Athena Playground, Google Cloud Developer Challenge, and much more

This month, our Content Team released 8 new learning paths, 4 courses, 7 labs in real cloud environments, and 4 new knowledge check assessments. Not only that, but we introduced our very first course on Alibaba Cloud, and our expert instructors are working 'round the clock to create 6 n...

Read more
  • alibaba
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming
Alisha Reyes
Alisha Reyes
— May 1, 2020

Introducing Our Newest Lab Environments: Lab Playgrounds

Want to train in a real cloud environment, but feel slowed down by spinning up your own deployments? When you consider security or pricing costs, it can be costly and challenging to get up to speed quickly for self-training. To solve this problem, Cloud Academy created a new suite of la...

Read more
  • AWS
  • Azure
  • Docker
  • Google Cloud Platform
  • Java
  • lab playgrounds
  • Python
Simon Dwight Keller
Simon Dwight Keller
— April 28, 2020

How to Install WordPress on Google Cloud

As your site grows, attracts more visitors, and becomes an income source, you can no longer accept downtime or slow-downs. That’s when you know it’s time to invest in a more reliable hosting solution. For this, Google Cloud is one of the best options out there. In this article, I’ll ...

Read more
  • Google Cloud Platform
  • install wordpress
  • Wordpress
Alisha Reyes
Alisha Reyes
— April 9, 2020

New on Cloud Academy: AWS Solutions Architect Exam Prep, Azure Courses, GCP Engineer Exam Prep, Programming, and More

Free content on Cloud Academy More and more customers are relying on our technology and content to keep upskilling their people in these months, and we are doing our best to keep supporting them. While the world fights the COVID-19 pandemic, we wanted to make a small contribution to he...

Read more
  • AWS
  • Azure
  • Google Cloud Platform
  • programming
Alisha Reyes
Alisha Reyes
— March 7, 2020

New on Cloud Academy: Intro to GitOps; AWS Courses; Java, Python, Amazon Linux 2, Ubuntu, & Docker Playgrounds; and much more

New Lab Playgrounds This month, our Content Team released six new "playground labs." Our playground labs provide a safe and secure sandbox environment for you to explore your own ideas, follow along with Cloud Academy courses, or answer your own questions — all without having to instal...

Read more
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming
Alisha Reyes
Alisha Reyes
— March 6, 2020

New on Cloud Academy: Intro to GitOps; AWS Courses; Java, Python, Amazon Linux 2, Ubuntu, & Docker Playgrounds; and much more

New Lab Playgrounds This month, our Content Team released six new "playground labs." Our playground labs provide a safe and secure sandbox environment for you to explore your own ideas, follow along with Cloud Academy courses, or answer your own questions — all without having to instal...

Read more
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming