1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Introduction to Amazon Rekognition

Image Processing - API

Contents

keyboard_tab
Amazon Rekognition Course Review

The course is part of these learning paths

Solutions Architect – Professional Certification Preparation for AWS
course-steps
48
certification
7
lab-steps
19
quiz-steps
4
description
2
Applying Machine Learning and AI Services on AWS
course-steps
5
certification
1
lab-steps
2
play-arrow
Start course
Overview
DifficultyBeginner
Duration1h 11m
Students1184
Ratings
4.8/5
starstarstarstarstar-half

Description

In this lecture we dive into the Amazon Rekognition service and how it works specifically for processing images. We'll cover off each of the main image processing features such as Facial Analysis, Face Comparison, Celebrity Detection, Text Extraction, Content Moderation, and Feature Extraction.
 

Transcript

Welcome back. In this lecture, we'll now start diving into the Amazon Rekognition service and how it works specifically for processing images. Let's start off by listing the main Rekognition services available for processing images. As can be seen on the slide, the main functions available are:

  • Facial Analysis. This allows you to detect faces within an image and, for each detected face, provide facial details such as location of eyes and nose. Facial Analysis can also provide details as to their emotional state: happy sad or angry, etc.
  • Face Comparison. This allows you to compare faces and determine if they are of the same person.
  • Celebrity Detection allows you to find out if the picture of someone is of a known celebrity, and who that celebrity is.
  • Text Extraction allows you to extract text from within an image and provide it back in textual form.
  • Content Moderation allows you to determine if an image contains inappropriate or objectionable content.
  • Feature Extraction allows you to determine objects and features captured within an image.

In the following slides, we will cover off each of these functions and how you go about interacting with the respective API operations.

Facial Analysis provides the capability to perform an analysis on an image to detect the position of faces and facial features within. The DetectFaces API action will return a list of the 100 largest faces detected within an image. For each detected face within the list, the facial composition is also provided as a set of attributes. Images are submitted to the Rekognition service in one of two ways. The more often used approach is to store the image file within an S3 bucket and then provide the S3 location of the image to the Rekognition service. The second approach is to base64-encode the image data and supply this as an input parameter to the API operation.

Assuming the image as seen within the slide is submitted to the DetectFaces API operation, then the expected response would be returned with the locational bounding box of the face within the image, as well as a set of attributes determining the location of the eyes, nose, and mouth, etc. Additionally, the response will contain information regarding the emotional state of the face. For example, are they happy, sad, or angry? The response will also return whether the person is male or female.

Other features can be detected and returned such as to whether the user has a beard or mustache or is wearing sunglasses or not. For each of these detections, a confidence score is provided. Our first of two demonstrations that we give at the end of this course will use the DetectFaces API operation to receive a photo taken from directly within the browser and return an AJAX response back to the webpage with all facial features detected within.

The following example demonstrates the structure and content required for a request to the DetectFaces API action. Here the image is located within an S3 bucket called cloudacademy-detectfaces. The response for the previous request confirms detection and bounding box location of the face or faces within the image. Additionally, the response confirms that the face of the person is male, is happy, and isn't wearing sunglasses.

The next capability in the recognition service is the ability to compare and recognize, that is, the ability to provide facial comparison between two faces. Is the person in the first image the same as the person in second image? The CompareFaces API operation will return an ordered list of the 100 largest faces detected within an image and for which match closest in similarity to the target image. The similarity threshold can be applied to the request to control the behavior of the matching algorithm.

For each matching face within the returned list, the facial composition is also provided as a set of attributes. Additionally, a list of non-matching faces found within the submitted image is also returned. Both the source and targeted images are submitted to the CompareFaces API service using the same two mechanisms detailed in the DetectFaces API action. Those being using an S3 bucket and providing the S3 locations for both source and target or by using a base64-encoding of the images being used.

The following example demonstrates the structure and content required for a request to the CompareFaces API operation. Here the images, both target and source, are provided in-line using base64-encoding. The following partial response would have been returned for the previous request and confirms that the person in the source and target photos are of the same person. Additionally, a confidence level of similarities is provided, together with the bounding box attributes of where the source face was detected.

Next up is the ability to perform celebrity detection. The recognized celebrities API operation will return the 100 largest faces detected within an image. The 100 largest faces are divided into those with a detected face as determined to be that of a known celebrity and then the remainder of those which are determined to not be of any known celebrity. For each returned celebrity matching face, the name of the celebrity is provided, an ID, and a list of URLs for which extra information can be consulted for the celebrity in question.

Again, the image for which celebrity detection is being performed on can be submitted using the same two previously discussed approaches, either by storing the image in an S3 bucket or by performing a base64-encoding of the image and supplying that. The following example demonstrates the structure and content required for a request to the recognized celebrities API action.Here the image of a known celebrity is provided in-line using a base64-encoding. The following partial response would have been returned for the previous request, and confirms that the person in the submitted image is indeed a celebrity. The details, name, and URLs for the celebrity in question are given together with the bounding box attributes of where the celebrity face was detected within the image.

Next up is the ability to perform text extraction or be able to detect and return text found within an image. The DetectText API operation allows you, for example, to extract the text used with a marketing brochure, or the text from an image of a business receipt, or the flight numbers listed within an image of an arrivals board at an airport. Depending on how the layout of the text within the image occurs, a list of single words and/or lines of equally spaced words is returned. 

Again, the image for which text extraction is being performed can be submitted using the same two previously discussed approaches. Either by storing the image in an S3 bucket or by performing a base64-encoding of the image and supplying that.

The following example demonstrates the structure and content required for a request to the DetectText API action. Here the image containing a driver's license is provided in-line using a base64-encoding. The following partial response would have been returned for the previous request and provides the text extracted from within the sample driver's license.

The details extracted from the driver include their name, address, and date of birth amongst many others. For each detected single word, and or line of words, the geometry and bounding box is provided as to the where abouts the text was detected within the image. Next up is the ability to perform content moderation to determine whether the image in question contains content that could be considered inappropriate.

The Detect Moderation labels API operation allows you to submit an image for content moderation analysis. The operation will in turn respond with labels representing those features within the image that are deemed to be objectionable. The moderation labels returned follow a two level hierarchy. At the top level there are two possible levels, explicit nudity and suggestive. Beneath this level are more granule labels.

Again, the image for which Content Moderation is being performed on can be submitted using the same two previously discussed approaches, either by storing the image in an S3 bucket or by performing a base64-encoding of the image and supplying that. The following example demonstrates the structure and content required for a request to the detect moderation labels API action.

Here the image containing a driver's license is provided in-line using a base64-encoding. The following partial response would have been returned for the previous request and provides the text extracted from within the sample driver's license. The details extracted from the driver include their name, address, and date of birth amongst many others. For each detected single word, and/or line of words, the geometry and bounding box is provided as to the whereabouts the text was detected within the image.

Next up is the ability to perform content moderation to determine whether the image in question contains content that could be considered inappropriate. The DetectModerationLabels API operation allows you to submit an image for content moderation analysis. The operation will in turn respond with labels representing those features within the image that are deemed to be objectionable.

The moderation labels returned follow a two-level hierarchy. At the top level there are two possible levels: explicit nudity and suggestive. Beneath this level are more granular labels. 

Again, the image for which content moderation is being performed on can be submitted using the same two previously discussed approaches: either by storing the image in an S3 bucket or by performing a base64-encoding of the image and supplying that.

The following example demonstrates the structure and content required for a request to the DetectModerationLabels API action. Here the image containing a person in swimwear is provided in-line using a base64-encoding. The following partial response would have been returned for the previous request and provides the detected moderation labels. Each moderation label provided has a confidence score providing a degree of certainty about the label.

Next up is the ability to perform feature extraction to detect what interesting features exist within an image. The DetectLabels API operation is concerned with discovering features and providing them back as labels with the confidence number. For example, using DetectLabels on the images seen here results in the detection of glasses, a computer, and electronics as the top three labels based on confidence. Overall, 13 features were detected and returned. Again, the image for which feature extraction is being performed on can be submitted using the same two previously discussed approaches: either by storing the image in an S3 bucket or by performing a base64-encoding of the image and supplying that. 

The following example demonstrates the structure and content required for a request to the DetectLabels API action. Here the image containing a laptop and mobile phone is provided in-line using a base64-encoding. The following response would have been returned for the previous request and provides the detected labels. Each feature label provided has a confidence score providing a degree of certainty.

That concludes our lecture on Rekognition image processing APIs. In the next lecture, we'll review the Rekognition video processing APIs. Here we'll learn about the async processing pattern used and how the individual API operations are called to perform video analysis. Go ahead and close this lecture and we'll see you shortly in the next one.

About the Author
Students35980
Labs33
Courses93
Learning paths23

Jeremy is the DevOps Content Lead at Cloud Academy where he specializes in developing technical training documentation for DevOps.

He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 20+ years. In recent times, Jeremy has been focused on DevOps, Cloud, Security, and Machine Learning.

Jeremy holds professional certifications for both the AWS and GCP cloud platforms.