Training a Model from a Tabular Dataset

Contents

Start course
Difficulty
Intermediate
Duration
1h 23m
Students
1202
Description

Learn how to operate machine learning solutions at cloud scale using the Azure Machine Learning SDK. This course teaches you to leverage your existing knowledge of Python and machine learning to manage data ingestion, data preparation, model training, and model deployment in Microsoft Azure.

If you have any feedback related to this course, please contact us at support@cloudacademy.com.

Learning Objectives

  • Create an Azure Machine Learning workspace using the SDK
  • Run experiments and train models using the SDK
  • Optimize and manage models using the SDK
  • Deploy and consume models using the SDK

Intended Audience

This course is designed for data scientists with existing knowledge of Python and machine learning frameworks, such as Scikit-Learn, PyTorch, and Tensorflow, who want to build and operate machine learning solutions in the cloud.

Prerequisites

  • Fundamental knowledge of Microsoft Azure
  • Experience writing Python code to work with data using libraries such as Numpy, Pandas, and Matplotlib
  • Understanding of data science, including how to prepare data and train machine learning models using common machine learning libraries, such as Scikit-Learn, PyTorch, or Tensorflow

Resources

The GitHub repo for this course, containing the code and datasets used, can be found here: https://github.com/cloudacademy/using-the-azure-machine-learning-sdk 

Transcript

Now that we have datasets, we are now ready to train models from them. We can pass datasets to scripts as inputs in the estimator being used to run the script. So we start by creating a folder called diabetes_training_from_tab_dataset. We then need a script that trains the classification model by using a tabular dataset, and it's passed to it as an input.

So to get going with that, we need to import the necessary libraries we'll use for this. And note we're using our magic command as well, write file, to create our script. We set our regularization hyperparameter. We next get the run context. And then we load the diabetes data passed as an input dataset. We then separate features and labels. So we get the necessary fields as X, and then we also supply the label column as Y. We then split the data into training and test set.

Next, we train our logistic regression model. We calculate the accuracy and then store that in the log by invoking log on run. We do the same for area under the curve as well. And finally, we save and complete the run. We can now create an estimator to run the script and define a name input for the training dataset, which is read by the script.

Note that the dataset class is defined in Azure ML data prep package, which is installed with the SDK. And this package includes optional support for pandas, which is used by the data frame method that we've been using. So you need to include this package in the environment where the training experiment will run.

So to set up our estimator, we're querying a specific estimator here, SKLearn, so we import that through. We also have our experiment class, RunDetails. We set up the script parameters, which will be used to supply our hyperparameter values. We then get the training dataset. We create our estimator.

So as before, our estimator will require the source directory. The entry script we've already created. Our script parameters, our compute, and then we need to supply dataset object as an input. We also need to specify the pip packages that is required. And as I said earlier, we need data prep.

Next we create an experiment. We run our experiment and show the run details while it is running. So do note that the first time the experiment is run, it may take some time to set up the Python environment. Subsequent runs will be quicker.

When the experiment has completed, in the widget, you can view the Azure ML logs and the metrics that were generated by the run.

About the Author
Students
1203
Courses
1

Kofi is a digital technology specialist in a variety of business applications. He stays up to date on business trends and technology and is an early adopter of powerful and creative ideas.
His experience covers a wide range of topics including data science, machine learning, deep learning, reinforcement learning, DevOps, software engineering, cloud computing, business & technology strategy, design & delivery of flipped/social learning experiences, blended learning curriculum design and delivery, and training consultancy.