Training a Model from a Datastore
Start course
1h 23m

Learn how to operate machine learning solutions at cloud scale using the Azure Machine Learning SDK. This course teaches you to leverage your existing knowledge of Python and machine learning to manage data ingestion, data preparation, model training, and model deployment in Microsoft Azure.

If you have any feedback related to this course, please contact us at

Learning Objectives

  • Create an Azure Machine Learning workspace using the SDK
  • Run experiments and train models using the SDK
  • Optimize and manage models using the SDK
  • Deploy and consume models using the SDK

Intended Audience

This course is designed for data scientists with existing knowledge of Python and machine learning frameworks, such as Scikit-Learn, PyTorch, and Tensorflow, who want to build and operate machine learning solutions in the cloud.


  • Fundamental knowledge of Microsoft Azure
  • Experience writing Python code to work with data using libraries such as Numpy, Pandas, and Matplotlib
  • Understanding of data science, including how to prepare data and train machine learning models using common machine learning libraries, such as Scikit-Learn, PyTorch, or Tensorflow


The GitHub repo for this course, containing the code and datasets used, can be found here: 


When we uploaded the files earlier we noted that the code returned a reference. A reference provides a way to pass the path to a folder in a data store to a script regardless of where the script is being run so that the script can access data in the datastore location.

The code below gets a reference to the diabetes data folder where we upload the diabetes csv files. And what it does here is it specifically configures the data reference or download. So in other words, it can be used to download the contents of the folder to the compute context whenever the reference is being used.

Downloading data works well for small volumes of data that will be processed on a local computer but if we're working with remote compute we can also configure a little reference to monitor the store location and read data directly from the data store source.

To use the data reference in a trading script, we must define a parameter for it and we're going to run you know a few sort of instructions to get us to do that so first we need to set up a folder named "diabetes training" and after setting our folder we need a script that trains the classification model by using the training data and all the csv files in the folder reference by the data reference parameter passed to it.

Let's import the libraries we need to work with. So we need os, we need argparse to pass information to our script. We need run, we need pandas, we need numpy, we need joblib for pipelining and parallel parallelism. We need our train_test_split, we need LogisticRegression and we need metrics.

So here we use argparse to set up and get our parameters. And the parameters in question here as the regularization and then the data folder. Next, we get our run context. And then we load our data from a data reference. Next we separate features and labels, so we've got the following columns that are then stored in x and then we have feature, which is whether they are diabetic or not, stored in y are our target.

Next, we split the data into training and test sets and then we proceed to train our classification model. We pass on the regularization parameter we defined and then we run a prediction and log the results of the accuracy. We do the same for the area under the curve as well by invoking predict probability. And then the information governed from that, we pass that to our AUC score to calculate the necessary information and then log that.

Finally, we save our model and complete the run. The script will load the training data from the data reference passed to it as a parameter, so now we just need to set up the script parameters to pass the file reference when we run the experiment.

So to set up an experiment we import an SKLearn estimator. We have experiment class, we have run details, and then we set up our parameters that will be passed on into the script. Right, so here we're creating a specific estimator, SKLearn, and pass on the following parameters. So the source directory, we also have to supply the entry script which we've created earlier. Okay. And then the script parameters which we feed in the input we want to go into the script. And we specify our compute target which is local.

Okay. So let's create an experiment. It's called diabetes training and we pass that to, along with the workspace object, to experiment and then with experiment object. We then run experiment and set it up in such a way that it shows the run details whilst it's running. You can see some of the output of the run below here.

Please note as before the first time the experiment is run, it may take some time to set up the Python environment. Subsequent runs will obviously be quicker. When the experiment has completed, in the widget, you can view the azure ml logs to verify that the files were downloaded before the experiment script was run.

About the Author

Kofi is a digital technology specialist in a variety of business applications. He stays up to date on business trends and technology and is an early adopter of powerful and creative ideas.
His experience covers a wide range of topics including data science, machine learning, deep learning, reinforcement learning, DevOps, software engineering, cloud computing, business & technology strategy, design & delivery of flipped/social learning experiences, blended learning curriculum design and delivery, and training consultancy.