1. Home
  2. Training Library
  3. Amazon Web Services
  4. Courses
  5. Get Started with Amazon SageMaker Data Wrangler, Data Pipeline, Feature Store and Ground Truth

SageMaker Studio - Getting Started with Data Wrangler

The course is part of these learning paths

Start Modelling Data with Amazon SageMaker
AWS Machine Learning – Specialty Certification Preparation
SageMaker Studio - Getting Started with Data Wrangler

Get started with the latest Amazon SageMaker services — Data Wrangler, Data Pipeline and Feature Store services — released at re:Invent Dec 2020. We also learn about the SageMaker Ground Truth and how that can help us sort and label data. 

Get a head start in machine learning by learning how these services can reduce the effort and time required for you to load and prepare data sets for analysis and modeling. Data scientists will often spend 70% or more of their time cleaning, preparing, and wrangling their data into a state where it’s suitable to train machine learning algorithms against the data. It’s a lot of work, and these new SageMaker services provides an easier way. 


When you look at the console, it's really quite difficult to tell where you get started with these new services. So, there are some steps that you need to do in the SageMaker Studio before you can start using or even accessing the Data Wrangler Tool. The first step is you need to provision SageMaker Studio if you haven't done this already.

Now, if you need to provision SageMaker Studio for the first time, but I'll show you how to do that. Otherwise, skip to the next lecture, which is setting up the SageMaker Data Wrangler. Okay, so let's walk through setting up the SageMaker Studio. Now to do this, there's two options. You can use the quick start, or you can set up the account to be run as a team account. So, best if you're just starting this process, use the quick start.

So open the SageMaker Console, choose SageMaker Studio from the top left-hand side of the page there, and on the studio setup page, under get started, choose quick start. Okay, let's create a name for our studio. We can keep that default name if we want to or make up our own. We can have up to 63 characters using characters, numbers, and a hyphen.

Okay, we need to choose a Role for SageMaker to execute. So for the execution Role, you can either choose one from the Role selector, or you can create your own I-A-M or A-R-N Role. The Role must have the Amazon SageMaker full access policy attached to it. So if you create new Role the create an IAM Role dialog appears, and we can set from here what we want the Role to be. And we must ensure that it has this Amazon SageMaker full access policy attached to it.

Now, you might find that when you first try and create this, it does era out, go back in and do it again. You'll notice that the SageMaker full access policy has been created, if you didn't already have it. Next step is for the S3 buckets that we're going to use. You need to specify what they are. If you don't wanna add any access to more buckets, just choose none.

Okay, so now we create the Role and SageMaker creates a new IAM. Now, as I mentioned, there's two options with the Roles. We can do this quick setup, or we can use a team setup, which has four projects basically. And that allows us to set SageMaker studio permissions, which will be required if we're gonna use the projects function. To SageMaker Studio will now provision itself. It can take quite a long time to do this. And as you can see here, we've created a SageMaker sesh demo.

The status is ready. The execution Role is created. The authentication method is set. We can see this settings we've got in here, and this is where we can enable the projects if we wanted. Very useful for Studio projects, which we'll walk through a little later. so we can access the Studio now that this has provisioned. But remember, it can be quite a while before it actually starts up. So don't be alarmed if you end up waiting for five minutes while the Studio has provisioned.


Introduction to SageMaker Data Wrangler - Setting Up SageMaker to Run Data Wrangler - Using Data Wrangler - Introduction to SageMaker Ground Truth - Service and Cost Review

About the Author
Andrew Larkin
Head of Content
Learning Paths

Head of Content

Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe.  His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.