Get Started with Amazon SageMaker Data Wrangler, Data Pipeline, Feature Store and Ground Truth
Introduction to SageMaker Data Wrangler

Get started with the latest Amazon SageMaker services — Data Wrangler, Data Pipeline and Feature Store services — released at re:Invent Dec 2020. We also learn about the SageMaker Ground Truth and how that can help us sort and label data. 

Get a head start in machine learning by learning how these services can reduce the effort and time required for you to load and prepare data sets for analysis and modeling. Data scientists will often spend 70% or more of their time cleaning, preparing, and wrangling their data into a state where it’s suitable to train machine learning algorithms against the data. It’s a lot of work, and these new SageMaker services provides an easier way. 


Hello and welcome, I'm Andy Larkin. and in this fast track course, I'm going to introduce you to the new Amazon SageMaker Data Wrangler, the SageMaker Pipelines and the SageMaker Feature Store. Now, these three SageMaker services are real game changers for budding data engineers and data scientists. So I'll show you how you can get started using these three services, the Data Wrangler, the Data Pipelines, and the Feature Store within the Amazon SageMaker Studio.

So what are they? Data Wrangler is a way to fast-track the loading and normalizing of data sets. Data Pipelines enables you to integrate clean that cleansing a normalization process with modeling and combine them into a workflow that could be shared across teams in a very visual interface. The SageMaker Feature Store enables you to save all of this process, the data loading, selection, cleansing exploration, and visualization processes as a library so they can be used and reused by other team members.

These three services make the job of data engineer and data scientists much easier. They reduce some of the heavy lifting and repetition that we tend to get stuck with in importing and cleansing data. These three new services are available within the SageMaker studio. Data Wrangler requires a little bit of one time configuration. For example, you have to use a specific EC2 instance for your notebook, but once you get that set up and going, the process is a game changer in the way that you load and normalized data sets.

So the Data Wrangler service lets you complete each step of a data preparation workflow. So you can have data ready for modeling sooner. You can do the four steps of data preparation from this one place. What I like most about that preparation stage is that the SageMaker visualization tool allows you to preview the data that you've loaded normalized, just to check it's readiness and completeness before you begin any modeling or passing it to another part of part of a team to run analysis on.

Okay, so we're gonna look at these features and get familiar with how to use them. So you can start using them to prepare data for modeling and for visualization.


Getting Started with Data Wrangler - Setting Up SageMaker to Run Data Wrangler - Using Data Wrangler - Introduction to SageMaker Ground Truth - Service and Cost Review

About the Author
Learning Paths

Andrew is fanatical about helping business teams gain the maximum ROI possible from adopting, using, and optimizing Public Cloud Services. Having built  70+ Cloud Academy courses, Andrew has helped over 50,000 students master cloud computing by sharing the skills and experiences he gained during 20+  years leading digital teams in code and consulting. Before joining Cloud Academy, Andrew worked for AWS and for AWS technology partners Ooyala and Adobe.