What is Amazon Comprehend?

Contents

SAA-C03 Introduction
Decoupled Architecture
AWS Step Functions
3
AWS Step Functions
PREVIEW9m 55s
AWS Machine Learning Services
Design considerations
33
SAA-C03 Review
What is Amazon Comprehend?
Difficulty
Beginner
Duration
3h 46m
Students
1975
Ratings
4.6/5
starstarstarstarstar-half
Description

Domain One of The AWS Solution Architect Associate exam guide SAA-C03 requires us to be able to Design a multi-tier architecture solution so that is our topic for this section.
We cover the need to know aspects of how to design Multi-Tier solutions using AWS services. 

Want more? Try a lab playground or do a Lab Challenge!

Learning Objectives

  • Learn some of the essential services for creating multi-tier architect on AWS, including the Simple Queue Service (SQS) and the Simple Notification Service (SNS)
  • Understand data streaming and how Amazon Kinesis can be used to stream data
  • Learn how to design a multi-tier solution on AWS, and the important aspects to take into consideration when doing so
  • Learn how to design cost-optimized AWS architectures
  • Understand how to leverage AWS services to migrate applications and databases to the AWS Cloud
Transcript

In this lecture, I want to introduce you to Amazon Comprehend. And for many, this will likely be a new service that you may not have encountered yet, as it sits outside of what is considered the core of the AWS services. So, what is it exactly? Well, Amazon Comprehend falls under the machine learning category of AWS, and it uses a continuously pre-trained model to identify and extract valuable insights from within the text of documents through the use of natural language processing, known as NLP.

Before I continue, let me just explain what NLP is. As stated on Wikipedia and at a high level, NLP is explained as a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language. The goal is a computer capable of understanding the contents of documents, including the contextual nuances of the language within them. So essentially, it's a great technology and process to understand language and its structure by reading texts at huge scales underpinned by machine learning, which can analyze documents far quicker than us mere mortals can.

So one of the key points of Amazon Comprehend is that it can produce insights for you. And that's just a different way of saying meaningful data. And this allows you to use this data and knowledge to make changes and adjustments to your business and perhaps capitalize on this valuable data gained. For example, enhancing your customer user experience by detecting customer sentiment, which would allow you to determine what actions could lead to the most positive customer experience and outcomes.

So we know that Comprehend can scan documents at scale and understand the content of them. It can extract data such as key phrases, entities, and sentiment and more using a range of different APIs. So let's now take a look at this data classification and some of the APIs to see how Comprehend defines these.

Key phrases. A key phrase is a combination of words that contain a noun phrase that describes something. A noun is a word used to identify people, places, or things. The noun phrase will, of course, contain a noun, but it will also include some identifiers about that noun. For example, my new blue car. This is a noun phrase. The noun is car, and new and blue are adjectives, which name attributes of the noun. For every key phrase that is detected by Comprehend, it will issue a score, and this score determines how confident Comprehend is that the string of text being referenced is a noun phrase. This scoring can then be used by your own applications to determine if it is a key phrase that should be considered.

Sentiment. The sentiment relates to the emotional context of a block of text. Amazon Comprehend will try to determine the underlying sentiment. For example, if the document being scanned was positive, negative, neutral, or even mixed, it will generate a percentage score rating for each of these four emotions to determine the overall sentiment of the document. A great use case for sentiment could be to read feedback comments about your products to determine if buyers were generally pleased with the product or not.

Entities. An entity within Comprehend can be described as a reference to a person, a place, an event, a specific date and time, in addition to commercial items and quantities. So as an example, the following text. Stuart, Jorge, and Will visited Las Vegas in December, 2021, to attend the AWS re:Invent conference. So here, Stuart, Jorge, and Will might be referenced as people, Las Vegas would be recognized as a location, December, 2021, might be seen as a date, and re:Invent conference might be considered an event.

Again, each of these classifications will be attributed with a score to determine the confidence of Comprehend's selection of the text as an entity and its type. Here, you can see a list of all supported entities at the time of writing this course. For the most up-to-date list of entities, please see the following URL. Personally identifiable information, PII. PII data contains anything that could identify you as a person. Sometimes this data is considered private and so being able to identify data such as this can help you identify security risks.

Amazon Comprehend uses a large list of PII entities to help identify this data, as you can see in this table taken from the AWS documentation found here. As I said previously, PII contains sensitive data. As a result, Comprehend can do one of two things when it returns its results. It will either identify the PII information and classify the PII identity type it has found and present that data, or it can redact the PII data that it has found from within the document. So for example, in this text that says, "Dear Stuart, The current balance of your account, an account number, can now be accessed online. A copy of your statement has also been emailed to a 100 Cloud Street, Northampton, United Kingdom." So in this example, Stuart would have a type of name. 1234567890 would have a type of bank account number. 100 Cloud Street, Northampton, United Kingdom would have a type of address.

Now, if you decided to redact this information using Comprehend, it would return the text as follows, removing the sensitive PII information. Language. Amazon Comprehend has a wide range of different languages that it can understand. And based on the text being analyzed, it can determine which is the most dominant language that the text was written in. Again, a percentage rating is used to determine the confidence level of Comprehend in its understanding of the text.

Syntax. When Amazon Comprehend analyzes your text documents, it passes each and every word in an effort to determine the syntactic function of the word. This allows Comprehend to build up a detailed understanding of the words in the document and their relationship to each other. It does this by classifying each word as a noun, adjective, verb, pronoun, et cetera. For a full list of the different syntax types, of which there are 17, please see the AWS URL here.

Topic modeling. This helps you determine the different common topics or themes that exist amongst a large corpus of text. For example, you could submit a large number of science fiction stories to Comprehend to analyze, and it might return the topics such as time travel, teleportation, telekinesis, aliens, and space travel. Using a specific learning model, Amazon Comprehend is able to detect and analyze every word, its meaning, and its context.

As a result, if Comprehend detects that the same word is consistently used in the same context throughout the text, it will be used to determine a topic. So topic modeling is used to help you organize your documents into different categories.

So in short, Amazon Comprehend is a fully managed and continuously trained NLP service backed by machine learning, which is used to analyze and detect meaningful insights from any text in UTF-8 format, which is an encoding system for Unicode, or in a semi-structured document, such as a Word doc or a PDF file.

About the Author
Students
168845
Courses
72
Learning Paths
173

Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built  70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+  years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.