Preparing the Training Data


Start course
1h 23m

Learn how to operate machine learning solutions at cloud scale using the Azure Machine Learning SDK. This course teaches you to leverage your existing knowledge of Python and machine learning to manage data ingestion, data preparation, model training, and model deployment in Microsoft Azure.

If you have any feedback related to this course, please contact us at

Learning Objectives

  • Create an Azure Machine Learning workspace using the SDK
  • Run experiments and train models using the SDK
  • Optimize and manage models using the SDK
  • Deploy and consume models using the SDK

Intended Audience

This course is designed for data scientists with existing knowledge of Python and machine learning frameworks, such as Scikit-Learn, PyTorch, and Tensorflow, who want to build and operate machine learning solutions in the cloud.


  • Fundamental knowledge of Microsoft Azure
  • Experience writing Python code to work with data using libraries such as Numpy, Pandas, and Matplotlib
  • Understanding of data science, including how to prepare data and train machine learning models using common machine learning libraries, such as Scikit-Learn, PyTorch, or Tensorflow


The GitHub repo for this course, containing the code and datasets used, can be found here: 


We can use local data files to train a model, but when running training workloads automatically on cloud-based compute, it makes more sense to store the data centrally in the cloud and ingest it into the training script. Wherever it happens to be running. Selects, we will upload the training data to a data store and define a data set that can be used to assess the data from a training script.

For simplicity, we'll upload the data to the default data store of our Azure Machine Learning workspace. This is Azure storage block container that was created when we provisioned the workspace. But in a real solution, we are more likely to register at a store. The reference is the cloud location, where we typically store our data. We'll then create a tablet in a set the reference is the CSB files we uploaded.

We start by importing our data set class. We get hold of our default data store. We check for the existence of our Diabetes Data Set, and then we upload the files, if it's we don't have a data set registered of that name. It's possible we may have the file uploaded, but unnecessarily registered sort of right, it could be true here, will replace existing files that have the same name of what we are uploading.

Next, we could a tablet at a set from a path on data store. Now, this may take a short while to get sorted. And when that's complete, we can register our tabular data set.

About the Author

Kofi is a digital technology specialist in a variety of business applications. He stays up to date on business trends and technology and is an early adopter of powerful and creative ideas.
His experience covers a wide range of topics including data science, machine learning, deep learning, reinforcement learning, DevOps, software engineering, cloud computing, business & technology strategy, design & delivery of flipped/social learning experiences, blended learning curriculum design and delivery, and training consultancy.