Building a Model
Start course

This course takes an introductory look at using the SageMaker platform, specifically within the context of preparing data, building and deploying machine learning models.

During this course, you'll gain a practical understanding of the steps required to build and deploy these models along with learning how SageMaker can simplify this process by handling a lot of the heavy lifting both on the model management side, data manipulation side, and other general quality of life tools.

If you have any feedback relating to this course, feel free to contact us at

Learning Objectives

  • Obtain a foundational understanding of SageMaker
  • Prepare data for use in a machine learning project
  • Build, train, and deploy a machine learning model using SageMaker

Intended Audience

This course is ideal for data scientists, data engineers, or anyone who wants to get started with Amazon SageMaker.


To get the most out of this course, you should have some experience with data engineering and machine learning concepts, as well as familiarity with the AWS platform.


Now if you remember, next in the workflow is building the model. To go back to the hub, AKA the SageMaker dashboard, you'll notice that notebooks are next. If you're familiar with it, SageMaker notebooks are basically managed Jupyter Notebook setups. Jupyter Notebook is what IPython notebooks rebranded to a few years back, if you've never heard of it. They also compete with a service called Zeplin, but SageMaker uses a pre-installed managed version of Jupyter.

Now, just know that, although you're gonna build your notebook in SageMaker, Jupyter Notebooks are actually an open-sourced application that you can download and run yourself or run your in-house servers, several SageMakers, so you're not getting locked in.

If you're unfamiliar with it, this notebook basically allows you to run Python code on a server, in this case SageMaker, or if you set it up your own, and basically have the Python code assessable through your web browser, set programmable through your web browser, but execute on the remote environment. This gets really ideal, in that your laptop doesn't need to do all the heavy lifting, SageMaker or your own servers can provision the resources themselves.

So clicking into Notebooks, you're gonna be greeted by one of two things, either there's already some notebooks set up or there's absolutely none. Remember, SageMaker is shared by everybody with this Amazon account, not just IAM account, that means overarching account. So if anybody has taken this class or play with it before, you might see some notebooks there. But even if you do, or especially if you don't, you can get started by clicking that bright orange, create notebook instance button in the top right.

When creating an instance after setting basic things, such as the name, there's actually a little bit of nuance here. When selecting notebook instance type, basically you're able to select more and more powerful hardware along with the elastic interface, which allows you to attach things such as GPUs or graphic processing units to your notebook set up.

Now, remember on the cloud, more expensive hardware costs more, but it also performs better so you're not sitting there as long. So it makes sense to pick slightly more powerful hardware if you're doing slightly more intense jobs, but do keep in mind there's some cost implications with selecting big, heavy-duty surfers. You're also able to start to select things such as more fine-tuned security roles, network permissions, and even attach get repositories to your instance.

Don't worry about this too much. Larger organizations might have policies, your DevOpsps team, if you have a separate one, will have some strong permissions, but in general, default options are pretty good, especially if you're in an isolated environment, and after you've selected everything, hit create instance AKA the big button in the bottom right.

After creating the instance you, of course, can't immediately connect. Under status you would see that it was deploying and getting set up. But after a short bit, you'll see that it's in service and you're able to left click on it in order to either open Jupyter Notebooks or JupyterLab.

Basically, JupyterLab has a bit more nuance to the interface, while Jupyter itself is more just the direct Jupyter Notebook environment. Clicking Jupyter brings you into the most default Jupyter Notebook environment.

At first, you're gonna see no files and nothing running, but over time as you create different notebooks in different segments, you'll start to see this fill up and even start to see some statistics and ways to describe your notebooks.

Now, of course, you could just launch an empty notebook and get straight to coding, but SageMaker goes one step further and actually provides you a full list of example notebooks. In fact, these examples are so broad and there's so many of them, you can actually use these as templates to get started with making some of the most common machine learning models. This set of examples plus ground truth makes it so that you're actually able to get help each step along the way, and at no point you really need to start from zero with a blank development interface and start writing Python from the ground up. This is one of SageMaker's greatest strengths.

For example, let's say I have a classification problem and I wanted to test out K-Nearest Neighbors. Simply by clicking use, we get a notebook that looks just like this. I did notice this is not only annotated, it actually has sample data and sample code built into it. And this example, it has US Geological Survey data and additional data from US Forest Service, and it works through how to interpret that data and actually build a machine learning model around it.

This example is really good because it shows several key parts of how you can use a Jupyter Notebook. It shows how you can have annotations built in, and, of course, you can build your own. It actually starts with a bash interpreter running some Linux commands to load the data. And finally, it has some Python showing actually processing the data.

Now, for those of you who are a little more on the creative side, you might notice you can actually use SageMaker in order to make your own data labeling, and you could skip right over Ground Truth, especially if you have some pretty already clean data, or maybe you wanna label and clean your data in Python. So do keep that in mind, if you're a little more on the creative side, but just know the main takeaway from this is Jupyter Notebooks are able to store annotations, Python, machine learning, and even start to do things such as bash scripting with Linux commands in order to start loading data.

About the Author
Learning Paths

Calculated Systems was founded by experts in Hadoop, Google Cloud and AWS. Calculated Systems enables code-free capture, mapping and transformation of data in the cloud based on Apache NiFi, an open source project originally developed within the NSA. Calculated Systems accelerates time to market for new innovations while maintaining data integrity.  With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time. The ability to quickly develop large scale data ingestion and processing  decreases the risk companies face in long development cycles. Calculated Systems is one of the industry leaders in Big Data transformation and education of these complex technologies.