Creating and Running a Pipeline
Start course
1h 23m

Learn how to operate machine learning solutions at cloud scale using the Azure Machine Learning SDK. This course teaches you to leverage your existing knowledge of Python and machine learning to manage data ingestion, data preparation, model training, and model deployment in Microsoft Azure.

If you have any feedback related to this course, please contact us at

Learning Objectives

  • Create an Azure Machine Learning workspace using the SDK
  • Run experiments and train models using the SDK
  • Optimize and manage models using the SDK
  • Deploy and consume models using the SDK

Intended Audience

This course is designed for data scientists with existing knowledge of Python and machine learning frameworks, such as Scikit-Learn, PyTorch, and Tensorflow, who want to build and operate machine learning solutions in the cloud.


  • Fundamental knowledge of Microsoft Azure
  • Experience writing Python code to work with data using libraries such as Numpy, Pandas, and Matplotlib
  • Understanding of data science, including how to prepare data and train machine learning models using common machine learning libraries, such as Scikit-Learn, PyTorch, or Tensorflow


The GitHub repo for this course, containing the code and datasets used, can be found here: 


We're now ready to define and run the pipeline, but first we need to define the steps of the pipeline, and any data references that we need to pass between them. In the walkthrough, the first step must be to write the model to a folder that can be read from by the second step.

Now, since the steps will run on remote compute, and in fact, could each be run on a different computer, as we mentioned earlier, the folder path must be passed as a data reference to a location in a data store within the workspace. The pipeline data object is a special kind of data reference that is used to pass data from the output of one pipeline step to the input of another, creating a dependency between them.

So we will create one, and use it as output for the first step, and the input for the second step. You will also notice that we would need to pass it as a script arguments, so our code can access it at a store location referenced by the data reference. So we need to also import a Python script step, as well as the estimator step, and then generic estimator class.

Right, so we get the training data set, and then we create a pipeline data, okay? This is a temporary data reference for the model folder. So for our generic estimator, we'll specify the source directory, the compute target, what is our pipeline cluster. We specify environment definition, what is pipeline run, configure environment, and then we specify entry script.

Now, step one, we run estimator to train the model, okay, and then with step two, we run the model registration script. So now we're ready to build a pipeline from the steps we've defined and run it as an experiment. Now this may take a while, the training class that must be started and configured with a Python environment before the scripts can be run.

So to build the pipeline, we need to hold experiment. We need a pipeline class. We also add run details. So to construct the pipeline or pipeline steps, we have a list of the, the steps we're taking, so we've got training step, and we've got the registration step. So we passed a list of steps, as well as the workspace to the pipeline to have a pipeline built.

Next, we create an experiment and run the pipeline, okay, so we have an experiment being set up here, and then to the experiment, we submit the pipeline. We then pass on the results from our submission of the pipeline, the object we get a hold of, we pass it on to run details, and where we can observe the outputs of the run. You can also monitor the pipeline runs in the experiments page, Azure Machine Learning Studio. 

When the pipeline has finished, a new model should be registered with a training context tag indicating it was trained in the pipeline, and you can run the following code to verify. So, as you can see, we know the training context was a pipeline based on the tags.

About the Author

Kofi is a digital technology specialist in a variety of business applications. He stays up to date on business trends and technology and is an early adopter of powerful and creative ideas.
His experience covers a wide range of topics including data science, machine learning, deep learning, reinforcement learning, DevOps, software engineering, cloud computing, business & technology strategy, design & delivery of flipped/social learning experiences, blended learning curriculum design and delivery, and training consultancy.