1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Using Amazon Comprehend to Gain Valuable Insights from Text

What is Amazon Comprehend?


What is Amazon Comprehend?

Using natural language processing (NLP) and machine learning, Amazon Comprehend allows you to gather valuable insights from text. This course explains how!

Learning Objectives

  • Learn the fundamentals of Amazon Comprehend
  • Learn the three main processing models used in Comprehend
  • Understand the features and benefits of the service

Intended Audience

This course has been designed to assist those new to Amazon Comprehend, and who are looking to learn more about how NLP and machine learning can be used to gain valuable business data to enhance your solutions.


To get the most out of this course, you should have a basic awareness of machine learning and data analytics, but it's not essential.


In this lecture, I want to introduce you to Amazon Comprehend. And for many, this will likely be a new service that you may not have encountered yet, as it sits outside of what is considered the core of the AWS services. So, what is it exactly? Well, Amazon Comprehend falls under the machine learning category of AWS, and it uses a continuously pre-trained model to identify and extract valuable insights from within the text of documents through the use of natural language processing, known as NLP.

Before I continue, let me just explain what NLP is. As stated on Wikipedia and at a high level, NLP is explained as a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language. The goal is a computer capable of understanding the contents of documents, including the contextual nuances of the language within them. So essentially, it's a great technology and process to understand language and its structure by reading texts at huge scales underpinned by machine learning, which can analyze documents far quicker than us mere mortals can.

So one of the key points of Amazon Comprehend is that it can produce insights for you. And that's just a different way of saying meaningful data. And this allows you to use this data and knowledge to make changes and adjustments to your business and perhaps capitalize on this valuable data gained. For example, enhancing your customer user experience by detecting customer sentiment, which would allow you to determine what actions could lead to the most positive customer experience and outcomes.

So we know that Comprehend can scan documents at scale and understand the content of them. It can extract data such as key phrases, entities, and sentiment and more using a range of different APIs. So let's now take a look at this data classification and some of the APIs to see how Comprehend defines these.

Key phrases. A key phrase is a combination of words that contain a noun phrase that describes something. A noun is a word used to identify people, places, or things. The noun phrase will, of course, contain a noun, but it will also include some identifiers about that noun. For example, my new blue car. This is a noun phrase. The noun is car, and new and blue are adjectives, which name attributes of the noun. For every key phrase that is detected by Comprehend, it will issue a score, and this score determines how confident Comprehend is that the string of text being referenced is a noun phrase. This scoring can then be used by your own applications to determine if it is a key phrase that should be considered.

Sentiment. The sentiment relates to the emotional context of a block of text. Amazon Comprehend will try to determine the underlying sentiment. For example, if the document being scanned was positive, negative, neutral, or even mixed, it will generate a percentage score rating for each of these four emotions to determine the overall sentiment of the document. A great use case for sentiment could be to read feedback comments about your products to determine if buyers were generally pleased with the product or not.

Entities. An entity within Comprehend can be described as a reference to a person, a place, an event, a specific date and time, in addition to commercial items and quantities. So as an example, the following text. Stuart, Jorge, and Will visited Las Vegas in December, 2021, to attend the AWS re:Invent conference. So here, Stuart, Jorge, and Will might be referenced as people, Las Vegas would be recognized as a location, December, 2021, might be seen as a date, and re:Invent conference might be considered an event.

Again, each of these classifications will be attributed with a score to determine the confidence of Comprehend's selection of the text as an entity and its type. Here, you can see a list of all supported entities at the time of writing this course. For the most up-to-date list of entities, please see the following URL. Personally identifiable information, PII. PII data contains anything that could identify you as a person. Sometimes this data is considered private and so being able to identify data such as this can help you identify security risks.

Amazon Comprehend uses a large list of PII entities to help identify this data, as you can see in this table taken from the AWS documentation found here. As I said previously, PII contains sensitive data. As a result, Comprehend can do one of two things when it returns its results. It will either identify the PII information and classify the PII identity type it has found and present that data, or it can redact the PII data that it has found from within the document. So for example, in this text that says, "Dear Stuart, The current balance of your account, an account number, can now be accessed online. A copy of your statement has also been emailed to a 100 Cloud Street, Northampton, United Kingdom." So in this example, Stuart would have a type of name. 1234567890 would have a type of bank account number. 100 Cloud Street, Northampton, United Kingdom would have a type of address.

Now, if you decided to redact this information using Comprehend, it would return the text as follows, removing the sensitive PII information. Language. Amazon Comprehend has a wide range of different languages that it can understand. And based on the text being analyzed, it can determine which is the most dominant language that the text was written in. Again, a percentage rating is used to determine the confidence level of Comprehend in its understanding of the text.

Syntax. When Amazon Comprehend analyzes your text documents, it passes each and every word in an effort to determine the syntactic function of the word. This allows Comprehend to build up a detailed understanding of the words in the document and their relationship to each other. It does this by classifying each word as a noun, adjective, verb, pronoun, et cetera. For a full list of the different syntax types, of which there are 17, please see the AWS URL here.

Topic modeling. This helps you determine the different common topics or themes that exist amongst a large corpus of text. For example, you could submit a large number of science fiction stories to Comprehend to analyze, and it might return the topics such as time travel, teleportation, telekinesis, aliens, and space travel. Using a specific learning model, Amazon Comprehend is able to detect and analyze every word, its meaning, and its context.

As a result, if Comprehend detects that the same word is consistently used in the same context throughout the text, it will be used to determine a topic. So topic modeling is used to help you organize your documents into different categories.

So in short, Amazon Comprehend is a fully managed and continuously trained NLP service backed by machine learning, which is used to analyze and detect meaningful insights from any text in UTF-8 format, which is an encoding system for Unicode, or in a semi-structured document, such as a Word doc or a PDF file.

About the Author
Stuart Scott
AWS Content Director
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.