1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Introduction to Amazon Rekognition

Image Processing - API

The course is part of these learning paths

Solutions Architect – Professional Certification Preparation for AWS
course-steps 47 certification 6 lab-steps 19 quiz-steps 4 description 2
Applying Machine Learning and AI Services on AWS
course-steps 5 certification 1 lab-steps 2

Contents

keyboard_tab
Amazon Rekognition Course Review
play-arrow
Start course
Overview
DifficultyBeginner
Duration1h 11m
Students532
Ratings
4.8/5
star star star star star-half

Description

In this lecture we dive into the Amazon Rekognition service and how it works specifically for processing images. We'll cover off each of the main image processing features such as Facial Analysis, Face Comparison, Celebrity Detection, Text Extraction, Content Moderation, and Feature Extraction.
 

Transcript

- [Instructor] Welcome back. In this lecture we'll now start diving into the Amazon Rekognition service and how it works specifically for processing images. Let's start off by listing the main recognition services available for processing images. As can be seen on the slide, the main functions available are Facial Analysis. This allows you to detect faces within an image and for each detected face provide facial details such as location of eyes and nose.

Facial Analysis can also provide details as to their emotional state. Happy sad or angry, etc. Face Comparison. This allows you to compare faces and determine if they are of the same person. Celebrity Detection allows you to find out if the picture of someone is of a known celebrity, and who that celebrity is. Text Extraction allows you to extract text from within an image and provide it back in textual form.

Content Moderation allows you to determine if an image contains inappropriate or objectionable content. Feature Extraction allows you to determine objects and features captured within an image. In the following slides we will cover off each of these functions and how you go about interacting with the respective API operations. Facial Analysis provides the capability to perform an analysis on an image to detect the position of faces and facial features within.

The detect faces API action will return a list of the 100 largest faces detected within an image. For each detected face within the list the facial composition is also provided as as a set of attributes. Images are submitted to the recognition service in one of two ways, the more often used approach is to store the image file within an S3 bucket and then provide the S3 location of the image to the recognition service.

The second approach is to base64-encode the image data and supply this as an input perimeter to the API operation. Assuming the image is seen within the slide is submitted to the detect faces API operation then the expected response would be returned with the location or bounding box of the face within the image as well as a set of attributes determining the location of the eyes, nose, and mouth, etc. Additionally the response will contain information regarding the emotional state of the face. For example, are they happy, sad, or angry? The response will also return whether the person is male or female.

Other features can be detected and returned such as to whether the user has a beard or mustache or is wearing sunglasses or not. For each of these detections, a confidence score is provided. Our first of two demonstrations that we give at the end of this course will use the detect faces API operation to receive a photo taken from directly within the browser and return an ejects response back to the webpage with all facial features detected within.

The following example demonstrates the structure and content required for a request to the detect faces API action. Here the image is located within an S3 bucket called cloudacademy-detectfaces. The response for the previous request confirms detection and bounding box location of the face or faces within the image. Additionally the response confirms that the face of the person is male, is happy, and isn't wearing sunglasses.

The next capability in the recognition service is the ability to compare and recognize, that is the ability to provide facial comparison between two faces. Is the person in the first image the same as the person in second image? The compare faces API operation will return an ordered list of the 100 largest faces detected within an image and for which match closest in similarity to the target image. The similarity threshold can be applied to the request to control the behavior of the matching algorithm. For each matching face within the returned list, the facial composition is also provided as a set of attributes.

Additionally, a list of non matching faces found within the submitted image is also returned. Both the source and targeted images are submitted to the compare faces API service. Using the same two mechanisms detailed in the detect faces API action. Those being using an S3 bucket and providing these three locations for both source and target or by using a base64-encoding of the images being used.

The following example demonstrates the structure and content required for a request to the compare faces API operation. Here the images both target and source are provided in-line using base64-encoding. The following partial response would have been returned for the previous request and confirms that the person in the source and target photos are of the same person. Additionally, a confidence level of similarities provided together with the bounding box attributes of where the source face was detected.

Next up is the ability to perform Celebrity Detection. The recognized celebrities API operation will return the 100 largest faces detected within an image. The 100 largest faces are divided into those with a detected face is determined to be that of a known celebrity and then the remainder of those which are determined to not be of any known celebrity. For each returned celebrity matching face, the name of the celebrity is provided, an ID, and a list of URLs for which extra information can be consulted for the celebrity in question.

Again, the image for which celebrity detection is being preformed on can be submitted using the same two previously discussed approaches. Either by storing the image in an S3 bucket or by performing a base64-encoding of the image and supplying that. The following example demonstrates the structure and content required for a request to the recognized celebrities API action. Here the image of a known celebrity is provided in-line using a base64-encoding. The following partial response would have been returned for the previous request, and confirms that the person in the submitted image is indeed a celebrity. The details name in URLs for the celebrity in question are given together with the bounding box attributes of where the celebrity face was detected within the image.

Next up is the ability to perform Text Extraction or be able to detect and return text found within an image. The detect text API operation allows you, for example, to extract the text used with a name marketing brochure or the text from an image of a business receipt or the flight numbers listed within an image of an arrivals board at an airport.

Depending on how the layout of the text within the image occurs, a list of single words, and or lines of equally spaced words is returned. Again, the image for which text extraction is being performed can be submitted using the same two previously discussed approaches. Either by storing the image in an S3 bucket, or by performing a base64-encoding of the image and supplying that.

The following example demonstrates the structure and content required for a request to the Detect Text API action. Here the image containing a driver's license is provided in-line using a base64-encoding. The following partial response would have been returned for the previous request and provides the text extracted from within the sample driver's license.

The details extracted from the driver include their name, address, and date of birth amongst many others. For each detected single word, and or line of words, the geometry and bounding box is provided as to the where abouts the text was detected within the image. Next up is the ability to perform content moderation to determine whether the image in question contains content that could be considered inappropriate.

The Detect Moderation labels API operation allows you to submit an image for content moderation analysis. The operation will in turn respond with labels representing those features within the image that are deemed to be objectionable. The moderation labels returned follow a two level hierarchy. At the top level there are two possible levels, explicit nudity and suggestive. Beneath this level are more granule labels.

Again, the image for which Content Moderation is being performed on can be submitted using the same two previously discussed approaches, either by storing the image in an S3 bucket or by performing a base64-encoding of the image and supplying that. The following example demonstrates the structure and content required for a request to the detect moderation labels API action.

Here the image containing a person in swimwear is provided in-line using a base64-encoding. The following partial response would have been returned for the previous request and provides the detected moderation labels. Each moderation label provided has a confidence score providing a degree of certainty about the label.

Next up is the ability to perform Feature Extraction to detect what interesting features exist within an image. The detect labels API operation is concerned with the discovering features and providing them back as labels with the confidence number. For example, using detect labels on the images seen here, results in the detection of glasses, a computer, and a electronics as the top three labels based on confidence.

Overall, 13 features were detected and returned. Again, the image for which feature extraction is being performed on can be submitted using the same two previously discussed approaches. Either by storing the image in an S3 bucket or by performing a base64-encoding of the image and supplying that. The following example demonstrates the structure and content required for a request to the detect labels API action.

Here the image containing a laptop and mobile phone is provided in-line using a base64-encoding. The following response would have been returned for the previous request and provides the detected labels. Each feature label provided has a confidence score providing a degree of certainty.

That concludes our lecture on recognition image processing APIs. In the next lecture we'll review the recognition video processing APIs. Here we'll learn about the async processing pattern used in how the individual API operations are called to perform video analysis. Go ahead and close this lecture and we'll see you shortly in the next one.

About the Author

Students11272
Labs28
Courses65
Learning paths15

Jeremy is the DevOps Content Lead at Cloud Academy where he specializes in developing technical training documentation for DevOps.

He has a strong background in software engineering, and has been coding with various languages, frameworks, and systems for the past 20+ years. In recent times, Jeremy has been focused on DevOps, Cloud, Security, and Machine Learning.

Jeremy holds professional certifications for both the AWS and GCP cloud platforms.