Creating Scripts for Pipeline Steps


Start course
1h 23m

Learn how to operate machine learning solutions at cloud scale using the Azure Machine Learning SDK. This course teaches you to leverage your existing knowledge of Python and machine learning to manage data ingestion, data preparation, model training, and model deployment in Microsoft Azure.

If you have any feedback related to this course, please contact us at

Learning Objectives

  • Create an Azure Machine Learning workspace using the SDK
  • Run experiments and train models using the SDK
  • Optimize and manage models using the SDK
  • Deploy and consume models using the SDK

Intended Audience

This course is designed for data scientists with existing knowledge of Python and machine learning frameworks, such as Scikit-Learn, PyTorch, and Tensorflow, who want to build and operate machine learning solutions in the cloud.


  • Fundamental knowledge of Microsoft Azure
  • Experience writing Python code to work with data using libraries such as Numpy, Pandas, and Matplotlib
  • Understanding of data science, including how to prepare data and train machine learning models using common machine learning libraries, such as Scikit-Learn, PyTorch, or Tensorflow


The GitHub repo for this course, containing the code and datasets used, can be found here: 


Now we're ready to start work on our pipeline. Pipelines consists of one or more steps which can be Python scripts or they can be specialized steps like an AutoML training estimator or a data transfer step that copies data from one location to another. Each step can run in its own compute context.

So let's proceed to build a simple pipeline that contains an estimator step. A step to train a model and a Python script step that register the train model. And we start by creating a folder to contain the scripts for each step. So after creating our experiment folder, we'll start creating the script that will be used for my estimator.

So we put the necessary libraries. We then set up agPASS which allows us to get parameters passed on to a script from my estimator. Next we get the experiment run context. We then load the diabetes data. We separate features and labels. And then we split data into training and then the test set.

We then train a decision tree model. We calculate the accuracy and log the details. We also calculate the area of the curve and log the details. We save the model and we complete the run. The script for the second step of the pipeline will load the model from where it was saved and then register it in the workspace. It includes a single model folder parameter that contains a path to the folder where the model was saved by the previous step.

We import the necessary libraries, agPASS ,joblib and then we need a workspace class, our model and run. We use agPASS to be able to get the parameters. We get the experiment run context. We load our previously saved model and then registered it in our workspace and complete the run

About the Author

Kofi is a digital technology specialist in a variety of business applications. He stays up to date on business trends and technology and is an early adopter of powerful and creative ideas.
His experience covers a wide range of topics including data science, machine learning, deep learning, reinforcement learning, DevOps, software engineering, cloud computing, business & technology strategy, design & delivery of flipped/social learning experiences, blended learning curriculum design and delivery, and training consultancy.