Google Cloud Natural Language Processing API, First Steps

A new natural language processing API: Google is working hard to provide cloud services that facilitate human-computer interaction through tools that are able to consume human language.

In a recent blog post, we discussed Google Cloud Speech API, a service to convert audio speech to text. Probably driven by a similar technology based on deep neural networks (in particular, Tensorflow), Google recently released the Beta version of the Google Cloud Natural Language Processing API, a further brick of their machine learning architecture. Such service extracts meaning and structure from unstructured text. Specifically, the engine recognizes entities (people, organizations, locations, etc.), their syntax, made up by their logic role (noun, verb, adverbs, adjectives) and relations (the name the adjective refers to), and document polarity (positive, neutral, negative).

You can get a first look at its services and play around with some of the features at cloud.google.com/natural-language.To use the Natural Language Processing API, you can apply for a free trial and create an API key from the Google Cloud Platform’s API Manager.
screenshot-cloud-google-com-2016-11-18-15-53-39

Google’s natural language processing API and Python

The API exposes a RESTful interface that can be accessed via common POST HTTP requests. However, a multitude of clients is already available for several programming languages. In our tests, we experimented with the Python client, whose code can be retrieved on GitHub. To use the client, simply add the following entry to your requirements file:

# requirements.txt
google-api-python-client==1.5.5

For instance, the following Python script defines a function to submit a text to the Google Cloud Natural Language Processing API (just set your own API keys). Google provides four different endpoints: analyzeEntities, analyzeSentiment, analyzeSyntax,and annotateText. After several trials, we realized that the annotated text method basically includes all the information returned by the other three single methods. For this reason, we suggest using this method for your initial experiments to get an overview of all the available analyses.

import httplib2
import sys
from googleapiclient import discovery
from googleapiclient.errors import HttpError
def test_language_api(content):
  discovery_url = 'https://{api}.googleapis.com/$discovery/rest?version={apiVersion}'
  service = discovery.build(
      'language', 'v1',
      http=httplib2.Http(),
      discoveryServiceUrl=discovery_url,
      developerKey=’__PUT_YOUR_KEY_HERE__’,
  )
  service_request = service.documents().annotateText(
      body={
          'document': {
              'type': 'PLAIN_TEXT',
              'content': content,
          },
          'features': {
              'extract_syntax': True,
              'extractEntities': True,
              'extractDocumentSentiment': True,
          },
          'encodingType': 'UTF16' if sys.maxunicode == 65535 else 'UTF32',
      })
  try:
      response = service_request.execute()
  except HttpError as e:
      response = {'error': e}
  return response

The response of this natural language processing API is a JSON that contains:

  • The language, which is automatically detected. Note that currently, sentiment detection only works with English (“en”), while also Spanish (“es”) and Japanese (“ja”) are supported for the entity analysis. This is athemain difference with the speech-to-text API, where the support for more than 80 different languages was a killer feature.
  • The list of sentences. The content is split into sentences, using punctuation.
  • The list of tokens. The content is split into tokens. Roughly speaking, a token usually corresponds to a word. Each token is enriched with syntactic information.
  • The list of detected entities. Entities detected in the document are listed with their properties.
  • The negative/positive polarity of the submitted content. We noticed that the API has changed over the past few days. While previous versions computed the polarity of the full submitted content, the latest version we experimented with offers a more granular analysis, returning the result for every single sentence.

Natural Language Processing API: Basics

The syntax analysis is based on linguistic concepts such as tokens, part of speech, a dependency tree, and entities. To better understand the output of the API, let’s take a closer look at each of these terms.

The token is the fundamental unit of textual analysis. It can be defined in various ways, but it often corresponds to 1 word = 1 token. The definition is not universal and it depends on the processing engine. For example, some engines treat “credit card” as a single token, while Google Natural Language Processing API splits it into two separate tokens (linked by a relationship). Each token has its own grammar type (e.g., noun, verb, adjective, etc.), usually denoted as Part of Speech (POS).

The POS is an important piece of information about the word itself and its role in the sentence, although it does not explicitly encode a relation with other tokens in the sentence. The NLP task that assigns each token with its appropriate POS is commonly called POS tagging.

One or more tokens can define an entity, i.e., either someone or something of interest mentioned in the text. In the case of proper nouns (e.g., Paris, New York, Madonna, Tom Cruise), the more specific term named entity is used. Typically, the Named Entity Recognition (NER) task detects proper nouns of People, Places, and Organizations in a given document. However, topic-specific entities such as dates, addresses, events, proteins, drug names, etc. can also be the target of entity extraction.

Finally, the structure encoding relations among words in the sentence is captured by a dependency tree. The tree includes syntactic/semantic relations such as:

  • Verb-subject and verb-object
    • John [subject] ← plays → [object] basketball
  • Adjectival modifier for a given noun
    • I see a yellow [adjectival modifier] ← ball
  • Appositions
    • Gandalf, the Wizard [apposition]

These relations constitute the “dependency edges” of the tree. The dependency concept encodes the word that each refers to. Let’s consider these two sentences:

  1. John watches a tree with a telescope
  2. John watches a tree with a nest

The tokens in the two sentences have the same grammar types, but their dependencies are different. In (1) “telescope” is linked via “with” to the verb “watches” (the telescope is what John uses to watch the tree) while in (2) “nest” is linked via “with” to “tree”, since the nest is an object placed on the tree and is not related to the “way John watches something”. Think of it this way: if you remove “tree” from (1), the telescope still has the “right” to be in the sentence, while this is not true for “nest” in (2).

Practical Demo with Natural Language

Let’s look at such definitions using the Google service. The sample text is extracted from ABC News and was passed to test_language_api() using the method defined above.

sample_content = """
Joshua Brown, 40, was killed in Florida in May when his Tesla failed to differentiate
between the side of a turning truck and the sky while operating in autopilot mode.
"""
test_language_api(sample_content)

The result is in JSON:

 'documentSentiment': {'magnitude': 0.5, 'polarity': -1, 'score': -0.5},
 'entities': [{'mentions': [{
                   'text': {'beginOffset': 0, 'content': 'Joshua Brown'},
                   'type': 'PROPER'}],
               'metadata': {},
               'name': 'Joshua Brown',
               'salience': 0.62338889,
               'type': 'PERSON'},
                  ...
               ]
 'language': 'en',
 'sentences': [{'sentiment': {'magnitude': 0.5, 'polarity': -1, 'score': -0.5},
                'text': {'beginOffset': 0, 'content': '...'}}],
 'tokens': [{'dependencyEdge': {'headTokenIndex': 1, 'label': 'NN'},
             'lemma': 'Joshua',
             'partOfSpeech': {'tag': 'NOUN', ...},
             'text': {'beginOffset': 0, 'content': 'Joshua'}},
            {'dependencyEdge': {'headTokenIndex': 6, 'label': 'NSUBJPASS'},
             'lemma': 'Brown',
             'partOfSpeech': {'tag': 'NOUN', ...},
             'text': {'beginOffset': 7, 'content': 'Brown'}},
              ...
             {'dependencyEdge': {'headTokenIndex': 6, 'label': 'P'},
              'lemma': '.',
              'partOfSpeech': {'tag': 'PUNCT', ...},
              'text': {'beginOffset': 167, 'content': '.'}}
            ]

Our first impression was that the engine is robust in recognizing entities (note that “Joshua Brown” is listed among the entities as a person), directly linking them to the related Wikipedia page, when available. However, currently, the returned set of metadata is not particularly rich (and only the Wikipedia URLs are provided, if available).  This is probably a work-in-progress feature. In addition, the engine computes the salience score, i.e., the importance of the entity in the sentence, even though it is not particularly accurate. In fact, in the example, it detected that Joshua Brown is relevant, but it scored “Tesla” and “autopilot mode” as less salient than Florida.

s for sentiment detection, it returned a negative polarity, as you can see in the returned field “documentSentiment”.
In accordance with the official documentation, the polarity is a value in the range [-1.0, 1.0], where larger numbers represent more positive sentiment. The magnitude is a non-negative number, which represents the absolute magnitude of sentiment regardless of polarity (positive or negative). In all of our experiments, the polarity assumed either the value 1 or -1, while magnitude was a positive, float number never greater than 1.0. From version v1, a third value is returned, “score”, that simply displays the product between the other two, in our case -0.5 (obtained multiplying 0.5 by -1.0).
screenshot-cloud-google-com-2016-11-18-16-17-01

How does the Natural Language Processing API handle edge cases?

We stressed the sentiment engine by applying small changes to the sample text, as follows.

First, we removed “killed”, a word with strong polarity. The system correctly recognized the sentence as negative, with a lower score.

sample_content = """
Joshua Brown, 40, was in Florida in May when his Tesla failed to differentiate
between the side of a turning truck and the sky while operating in autopilot mode."
"""
test_language_api(sample_content)
# output:
'documentSentiment': {
    'magnitude': 0.2,
    'polarity': -1,
    'score': -0.2
},

In a second trial, we removed “failed to”, a verb with strong negative polarity. The meaning of the sentence has significantly changed compared to the original one, and the engine correctly detected the positive polarity of the new sentence.

sample_content = """
Joshua Brown, 40, was in Florida in May when his Tesla differentiate
between the side of a turning truck and the sky while operating in autopilot mode."
"""
test_language_api(sample_content)
# output:
'documentSentiment': {
    'magnitude': 0.4,
    'polarity': 1,
    'score': 0.4
},

Finally, we removed other parts of the sentence, and the engine still correctly recognized a positive polarity.

sample_content = """
Joshua Brown was in Florida when his Tesla differentiate between
a turning truck and the sky.
"""
test_language_api(sample_content)
# output:
'documentSentiment': {
    'magnitude': 0.5,
    'polarity': 1,
    'score': 0.5
},

To conclude, let us report a final example to test the disambiguation capabilities of the engine. Let’s consider the two sentences:

  1. Michael Jordan scored the final points for the victory.
  2. Michael Jordan played in the most recent Fantastic Four movie.

We can observe that the sentences refer to two different Michael Jordans. Surprisingly, the engine is also able to distinguish between the two different entities. In fact, in the first case, it returns the entity Michael Jordan, while in the second case it returns Michael B. Jordan. This means that, to some extent, the tool is able to disambiguate on the basis of some context around the entity. Digging a bit deeper, it looks like a strong clue, i.e., the “Fantastic Four” mention, is required for the correct disambiguation. In fact, if we remove the “Fantastic Four” mention from the second sentence and turn it into something like:

  • Jordan played in a fantastic movie.

Michael Jordan is interpreted as the basketball player instead of the actor, despite the reference to “movie” and to the “cinema domain” in general. However, this isn’t an easy guess, and it’s not necessarily wrong: considering Michael Jordan’s role in Space Jam, we can turn a blind eye to this mistake 🙂

screenshot-cloud-google-com-2016-11-18-16-12-52

 

Conclusion

Our first impression of the Google Cloud Natural Language Processing APIs is a positive one. This is an easy-to-use tool for NLP basic features, and it can be easily integrated with any third party services and applications through the REST API. We are particularly impressed by the rich syntax (take a look at the large number of “Dependencies Labels“) and the accurate sentiment detection. The main issue is poor documentation. We hope that it will be enriched before a stable service is finally released. Also, the support for only a restricted set of languages is a strong limitation; we definitely expected wider support. One tip: Be careful when using the libraries as they are constantly being updated (also for versions no longer marked as Beta).

If we have aroused your curiosity, stay tuned over the next weeks for our new post, where we will discuss performance and further experiments on the Google Natural Language Processing APIs and other cloud services for NLP.

Avatar

Written by

David Santucci

Computer science engineer, passionate about machine learning and Lego Mindstorms.


Related Posts

Alisha Reyes
Alisha Reyes
— November 6, 2019

New on Cloud Academy: AZ-900 Exam Update; MS-100 Exam Prep; PRINCE2 Foundation; Azure, Kubernetes, and Google Hands-on Labs; and Much More

This month, our Content Team really kicked it into overdrive with tons of new content. If you're Team Azure, then you'll be amazed at the number of Azure Courses and Hands-on Labs we published this month alone!  At any time, you can find all of our new releases by going to our Training ...

Read more
  • AZ-900
  • Azure
  • Google Cloud Platform
  • Kubernetes
  • MS-100
  • New content
  • PRINCE2
  • Product Feature
Joe Nemer
Joe Nemer
— October 30, 2019

How to Get Hands-on Experience on AWS, Azure, and GCP: Lab Challenges

Meaningful cloud skills require more than book knowledge. Hands-on experience is required to translate knowledge into real-world results. We see this time and time again in studies about how kids and adults best learn — doing the actual learning task is key. Hands-on Labs and Lab Challe...

Read more
  • AWS Labs
  • Azure
  • Google Cloud Platform
  • Hands-on Labs
Avatar
Cloud Academy Team
— October 23, 2019

Which Certifications Should I Get?

As we mentioned in an earlier post, the old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and compan...

Read more
  • AWS
  • Azure
  • Certifications
  • Cloud Computing
  • Google Cloud Platform
Alisha Reyes
Alisha Reyes
— October 1, 2019

New on Cloud Academy: ITIL® 4, Microsoft 365 Tenant, Jenkins, TOGAF® 9.1, and more

At Cloud Academy, we're always striving to make improvements to our training platform. Based on your feedback, we released some new features to help make it easier for you to continue studying. These new features allow you to: Remove content from “Continue Studying” section Disc...

Read more
  • AWS
  • Azure
  • Google Cloud Platform
  • ITIL® 4
  • Jenkins
  • Microsoft 365 Tenant
  • New content
  • Product Feature
  • Python programming
  • TOGAF® 9.1
Joe Nemer
Joe Nemer
— September 6, 2019

Google Cloud Functions vs. AWS Lambda: The Fight for Serverless Cloud Domination

Serverless computing: What is it and why is it important? A quick background The general concept of serverless computing was introduced to the market by Amazon Web Services (AWS) around 2014 with the release of AWS Lambda. As we know, cloud computing has made it possible for users to ...

Read more
  • AWS
  • Azure
  • Google Cloud Platform
Joe Nemer
Joe Nemer
— September 3, 2019

Google Vision vs. Amazon Rekognition: A Vendor-Neutral Comparison

Google Cloud Vision and Amazon Rekognition offer a broad spectrum of solutions, some of which are comparable in terms of functional details, quality, performance, and costs. This post is a fact-based comparative analysis on Google Vision vs. Amazon Rekognition and will focus on the tech...

Read more
  • Amazon Rekognition
  • AWS
  • Google Cloud Platform
  • Google Vision
Alisha Reyes
Alisha Reyes
— August 30, 2019

New on Cloud Academy: CISSP, AWS, Azure, & DevOps Labs, Python for Beginners, and more…

As Hurricane Dorian intensifies, it looks like Floridians across the entire state might have to hunker down for another big one. If you've gone through a hurricane, you know that preparing for one is no joke. You'll need a survival kit with plenty of water, flashlights, batteries, and n...

Read more
  • AWS
  • Azure
  • Google Cloud Platform
  • New content
  • Product Feature
  • Python programming
Avatar
Andrew Larkin
— August 13, 2019

Content Roadmap: AZ-500, ITIL 4, MS-100, Google Cloud Associate Engineer, and More

Last month, Cloud Academy joined forces with QA, the UK’s largest B2B skills provider, and it put us in an excellent position to solve a massive skills gap problem. As a result of this collaboration, you will see our training library grow with additions from QA’s massive catalog of 500+...

Read more
  • AWS
  • Azure
  • content roadmap
  • Google Cloud Platform
Avatar
Andrew Larkin
— August 7, 2019

Disadvantages of Cloud Computing

If you want to deliver digital services of any kind, you’ll need to estimate all types of resources, not the least of which are CPU, memory, storage, and network connectivity. Which resources you choose for your delivery —  cloud-based or local — is up to you. But you’ll definitely want...

Read more
  • AWS
  • Azure
  • Cloud Computing
  • Google Cloud Platform
Joe Nemer
Joe Nemer
— August 6, 2019

Google Cloud vs AWS: A Comparison (or can they be compared?)

The "Google Cloud vs AWS" argument used to be a common discussion among our members, but is this still really a thing? You may already know that there are three major players in the public cloud platforms arena: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP)...

Read more
  • AWS
  • Google Cloud Platform
  • Kubernetes
Luca Casartelli
Luca Casartelli
— April 19, 2019

4 Key Takeaways from Google Cloud Next ’19

Google Cloud Next ’19 was the flagship Google Cloud Platform developers conference, held in San Francisco’s Moscone Center. I was lucky enough to attend it with Cloud Academy, and got the chance to check out tons of breakout sessions and get great insight firsthand.   Next ’19 was my...

Read more
  • Google Cloud Platform
  • Kubernetes
  • Machine Learning
Avatar
Giacomo Marinangeli
— March 29, 2019

NEW: Custom Hands-On Labs for Azure and Google Cloud Platform

Harvard Business Review recently estimated that some 90% of corporate training never gets applied on the job. Given the $200B training industry, that is a staggering amount of waste. One reason for the disconnect? Lack of context. Cloud Academy’s platform was built to make it extraor...

Read more
  • Azure
  • Content Engine
  • Google Cloud Platform
  • Hands-on Labs