Running an Experiment on a Remote Compute Target
Start course
1h 23m

Learn how to operate machine learning solutions at cloud scale using the Azure Machine Learning SDK. This course teaches you to leverage your existing knowledge of Python and machine learning to manage data ingestion, data preparation, model training, and model deployment in Microsoft Azure.

If you have any feedback related to this course, please contact us at

Learning Objectives

  • Create an Azure Machine Learning workspace using the SDK
  • Run experiments and train models using the SDK
  • Optimize and manage models using the SDK
  • Deploy and consume models using the SDK

Intended Audience

This course is designed for data scientists with existing knowledge of Python and machine learning frameworks, such as Scikit-Learn, PyTorch, and Tensorflow, who want to build and operate machine learning solutions in the cloud.


  • Fundamental knowledge of Microsoft Azure
  • Experience writing Python code to work with data using libraries such as Numpy, Pandas, and Matplotlib
  • Understanding of data science, including how to prepare data and train machine learning models using common machine learning libraries, such as Scikit-Learn, PyTorch, or Tensorflow


The GitHub repo for this course, containing the code and datasets used, can be found here: 


There will be some scenarios that your local compute resources may not be sufficient to process a complex or long, running experiment that needs to process a large volume of data. And you may want to take advantage of the ability to dynamically create and use compute resources in the cloud.

Azure ML supports a range of compute targets, which you can define in your workspace and use to run experiments, paying for the resources only when using them. In our case, we will run a diabetes training experiment on a compute cluster with a unique name of our choosing.

So let's verify that it exists and if not, created. So we can use it to run our trading experiments. So to go on, we need to import the compute target class, the AmlCompute, we also need the compute target exception. The cluster name of choosing, the unique one I've chosen here is qa-azureml-sdk, and then in the try block, we check for existing compute target. If the target exists, it gets used. If not, it throws an exception and in exception block, we create and provision our compute target.

Specify the size and the maximum number of nodes required. We then pass the configuration details to a compute target to create a training cluster. So after completion, we are ready to run the experiment on the compute target we just created. So we can do this by specifying the compute target parameter. We do that on estimator.

So let's set up our estimator. So we import our estimator class. We need that environment, experiment run details, we get the registered environment we need, we set up the script parameters, we get our training data, and then we set up our generic estimator.

Note that this time our compute target is our cluster that we just created. So we're no longer running on local. So we're gonna run the experiment on the remote compute target. Next, we create an experiment, and then we run the experiment and ensure that we show the run details while running.

Please note that the experiment will take quite a lot longer because the container image must be built with the con environment, and then the cluster nodes must be started and the image deployed before the script can run. For a simple experiment like a diabetes training script, this may seem inefficient but imagine, you need it to run a more complex experiment with a large volume of data that will take several hours on your local workstation.

Dynamically creating more scalable compute may reduce overall time significantly. Now, while we are waiting for experiment to run, you can check on the status of the compute in a widget. Please note that after some time the widget may stop updating. You'll be able to tell the experiment run as completed by the information displayed immediately below the widget. And by the fact of the candle indicator at the top right of your notebook window has changed.

After the experiment is finished, you can get the metrics and files generated by the experiment run. And we can do so through the following code to get the metrics, and also we can get the file names.

About the Author

Kofi is a digital technology specialist in a variety of business applications. He stays up to date on business trends and technology and is an early adopter of powerful and creative ideas.
His experience covers a wide range of topics including data science, machine learning, deep learning, reinforcement learning, DevOps, software engineering, cloud computing, business & technology strategy, design & delivery of flipped/social learning experiences, blended learning curriculum design and delivery, and training consultancy.