1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Using Amazon Comprehend to Gain Valuable Insights from Text

Processing Models

Contents

keyboard_tab
Start course
Overview
Difficulty
Intermediate
Duration
20m
Students
19
Ratings
5/5
starstarstarstarstar
Description

Using natural language processing (NLP) and machine learning, Amazon Comprehend allows you to gather valuable insights from text. This course explains how!

Learning Objectives

  • Learn the fundamentals of Amazon Comprehend
  • Learn the three main processing models used in Comprehend
  • Understand the features and benefits of the service

Intended Audience

This course has been designed to assist those new to Amazon Comprehend, and who are looking to learn more about how NLP and machine learning can be used to gain valuable business data to enhance your solutions.

Prerequisites

To get the most out of this course, you should have a basic awareness of machine learning and data analytics, but it's not essential.

Transcript

In this short lecture, I want to highlight that there are, essentially, three different models when running Comprehend against your documents to detect any insights required. Your selection will depend on how many documents you have and how you want the results of the analysis to be displayed. So, the three different processing models are: single document processing, multi document synchronous processing, and asynchronous batch processing. Let me explain each of these in a little more detail, so you can understand the differences between them.

So, single document processing is run as an asynchronous process, which allows Comprehend to deliver the results of any analysis to be sent straight back to your application requiring the data. As the name implies, this is for scenarios where you only need to work with a single document at any one time. And the operations used for single document processing include: DetectDominantLanguage, DetectEntities, DetectKeyPhrases, DetectPiiEntities, ContainsPiiEntities, DetectSentiment, and DetectSyntax.

Next, we have multiple document synchronous processing. Of course, this is used when you need to run Comprehend across multiple documents at a time. And, to do this, you must use the Batch operations, which include the following synchronous operations: BatchDetectDominantLanguage, BatchDetectEntities, BatchDetectKeyPhrases, BatchDetectSentiment, and BatchDetectSyntax, using these operations, you can analyze and detect insights in up to 25 documents at a time with each document being analyzed individually with its own result for that document.

Interestingly, the API calls used for both single and multiple documents are the same. However, using the Batch operations can result in a better performance overall. And finally, we have asynchronous batch processing. So generally, you would use asynchronous batch processing when you need to analyze and detect text within large documents or a large quantity of documents. In addition to the operations I've already mentioned, it comes with an extra set of operations, specifically for topic modeling.

When running an asynchronous batch job with Comprehend your source data should be stored in Amazon S3, and you must run the jump from within the same region as your source data. Another prerequisite is that your source text documents must be in the UTF-8 format, and you must select one of two formats to submit your job in, as you can see in this table. So as you can see, it really depends on what you're trying to process as to which option you select.

One point to mention is that Amazon Comprehend will be accessing your Amazon S3 bucket. And so it will, of course, require permissions, which will be granted through the use of roles. This role can be created at the time of your asynchronous batch processing job, or you can select an existing role, if you already have one that exists with the correct permissions. The operations supported by asynchronous batch processing include the following: StartDominantLanguageDetectionJob, StartEntitiesDetectionJob, StartEventsDetectionJob, StartKeyPhrasesDetectionJob, StartPiiEntitiesDetectionJob, StartSentimentDetectionJob and StartTopicsDetectionJob.

About the Author
Avatar
Stuart Scott
AWS Content Director
Students
187352
Labs
1
Courses
158
Learning Paths
115

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.