Amazon SageMaker Ground Truth


Amazon SageMaker Ground Truth

This is a short refresher of the 7 AWS machine learning services announced at Re:invent 2018 which will cover:

  • Amazon SageMaker Ground Truth
  • Amazon Forecast
  • Amazon Comprehend Medical
  • Amazon Textract
  • Amazon Personalize
  • Amazon SageMaker RL
  • AWS DeepRacer

Learning Objective

  • It aims to provide an awareness of what each of the ML services is used for and the benefit that they can bring to you within your organization

Intended Audience

  • This course would be beneficial to anyone who is responsible for implementing, managing, and securing machine learning services within AWS


  • You should have a basic understanding of Machine learning concepts and principles to help you understand how each of these services fit into the AWS landscape

Related Training Content

Introduction to Machine Learning on AWS

Applying Machine Learning and AI Services on AWS

AWS Machine Learning - Specialty Certification Preparation


The truth is out there, well, most of the time. In cases that it's not, use Amazon SageMaker Ground Truth. Let's understand why. 

Labeled data is an essential ingredient for particular forms of machine learning, specifically supervised learning algorithms. During the training phase, the supervised learning algorithm will measure the accuracy of the model by generating predictions, and comparing them to a known label associated with the data. A typical example of this is image classification. When training an image classification model, labeled images are used, whereby each image contains one or many labels, indicating what is contained within the image, For example, a person, car, dog, cat, et cetera. 

The MNIST, CIFAR-10, and ImageNet are all examples of public domain datasets that have already been labeled, and are often used for training. During the training phase, checks can be performed to see if the predictive classification performed on an image matches the associated label. Iterations or epochs of training continue until such time that the predictions reach a desired level of accuracy.

To date, the process of labeling has been time consuming, with limited tooling to aid the job. To help expedite and improve the experience, Amazon SageMaker Ground Truth has been added to the SageMaker portfolio. Amazon's SageMaker Ground Truth is a labeling service which provides both automatic and human workforce labeling features. With GroundTruth, you simply upload your unlabeled data sets into an S3 bucket, next, create your manifest file with pointers to each of the images, and place the manifest file within the same S3 bucket.

Using the Ground Truth console, create a Labeling Workforce. A Labeling Workforce represents the human workforce, who performs the labeling itself. There are currently three options: Public, A team of global on demand workers, powered by Amazon Mechanical Turk; Private, A team of workers from your organization; Vendor, A selection of experienced vendors that specialize in providing data labeling services.

Finally, we are ready to create a labeling job. A labeling job represents the actual labeling exercise that you need to be performed. The key configuration requirements that need to be specified are: Job Name; Input Dataset Location, and this is the S3 bucket location of the manifest file; Output Dataset Location, an S3 bucket location to receive the labeling data; the Dataset Object Selection, and this allows you to either label the entire dataset, a random sample, or filtered selection of the data; the Task Type, and you select a Task Type from a list of Task Types, including Image Classification, Bounding Box, Text Classification, Semantic Segmentation, or use your own Custom Task Type; Workers, and you can select the human workforce required to perform the job; and the Bounding Box Labeling Tool, where you configure the UI labeling tool that will be used by the workers, and this includes providing helper text in the form of instructions and guidance, et cetera. 

Okay, so now the labeling job has been created, the chosen workforce will be invited to begin he process of labeling. Notifications are provided in the form of an email containing the URL to the Ground Truth labeling tool. If automatic labeling has been enabled for your job, Ground Truth will analyze and perform the labeling. Otherwise, the configured human workforce will use the Ground Truth tooling, and perform the labeling activity.

When the labeling job has been completed, the job owner or requester can visualize each image with its assigned label within the Ground Truth console. Finally, each image label is serialized back into the original manifest file, against the corresponding image.



Introduction to SageMaker Data Wrangler - Getting Started with Data Wrangler - Setting Up SageMaker to Run Data Wrangler - Using Data Wrangler - Service and Cost Review

About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.