image
Running an Experiment & Viewing Results

Contents

Start course
Difficulty
Intermediate
Duration
1h 23m
Students
1254
Description

Learn how to operate machine learning solutions at cloud scale using the Azure Machine Learning SDK. This course teaches you to leverage your existing knowledge of Python and machine learning to manage data ingestion, data preparation, model training, and model deployment in Microsoft Azure.

If you have any feedback related to this course, please contact us at support@cloudacademy.com.

Learning Objectives

  • Create an Azure Machine Learning workspace using the SDK
  • Run experiments and train models using the SDK
  • Optimize and manage models using the SDK
  • Deploy and consume models using the SDK

Intended Audience

This course is designed for data scientists with existing knowledge of Python and machine learning frameworks, such as Scikit-Learn, PyTorch, and Tensorflow, who want to build and operate machine learning solutions in the cloud.

Prerequisites

  • Fundamental knowledge of Microsoft Azure
  • Experience writing Python code to work with data using libraries such as Numpy, Pandas, and Matplotlib
  • Understanding of data science, including how to prepare data and train machine learning models using common machine learning libraries, such as Scikit-Learn, PyTorch, or Tensorflow

Resources

The GitHub repo for this course, containing the code and datasets used, can be found here: https://github.com/cloudacademy/using-the-azure-machine-learning-sdk 

Transcript

A key task performed as data scientists is to create and run experiments that process and analyze data. We will take a look at how to use an Azure ML experiment to run Python code and record values extracted from a data set that contains the details of patients that have been tested for diabetes.

By importing our experiment class at our Pandas and Matplotlib libraries, we'll use this to help us visualize and explore our data set. So to create an experiment, we need a workspace object and then a name for our experiment. And then to get access to our run object, we invoke start_logging of our experiments object.

Now, as we know, a run object represents a single trial of an experiment. Runs are used to monitor the asynchronous execution of a trial, log metrics as well as store your output of the trial. We can also analyze the results and assess artifacts generated by the trial from the run object.

We then import the diabetes data file into a Pandas data set from a data folder. And then next, what we need to do is count and log the number of rows we have within the dataset. So, yeah, we're using the log function from our run object to log the observation we have, the number of rows we have within the dataset.

Now the run object has multiple log functions, we can use it to log scalar values, we can use it to log lists, we can use it to log images and we can see examples of that taking place shortly, right? So, next we will plot and log the count of diabetic patients versus non-diabetic patients. And we're using the value counts function here to get the difference between diabetic and non-diabetic patients. We then visualize the information we've got, and then we use our run log image function to figure the plot we've just created.

Next, we need to log distinct pregnancy counts. So we're using a unique function here to get that information, and here we use our log_list function to log that information. We can also log summary statistics for numeric columns.

To get a subset of the data that we want to focus on we defined the med-columns list here that specifies the columns of interests. So PlasmaGlucose all the way to the BMI. We then, use that to filter out the specific columns that we want to view or extract from the dataset, and then use the described function to get summary statistics of our subset of data and convert that to a dictionary object.

We then I treat through our dictionary object, which is summary_stats, and use a log_row function of the run object here to log the interesting values that we're looking out for here, which is summary statistics for the numeric columns. Let's now save a sample of the data and upload it to the experiment output.

So, we use upload file from the run object, and then we pass on the file path location that we want to save to, as well as the path or stream details. And then we finish off our run. Our experiment is not complete, here we can use the run object to get information about the run and its outputs. So let's import JSON, and then to get the run details, we invoke get_details from the run object. We can also get metrics from the run, and then we can also get the files that were generated as a result, and we can then use JSON to dump those files.

You can look here below to see the run details. We've got the run ID, we've got when the run was completed, the start time, and also the observations that we logged, the number of rows. In Jupyter Notebooks, you can use the run details widget to get a better visualization on the run details while the experiment is running, or after it has finished.

So to get this done, we import our run details class, and then we pass in our run object to run details and then we invoke show, and that will give us opportunities to look at the details of our run.

About the Author
Students
1253
Courses
1

Kofi is a digital technology specialist in a variety of business applications. He stays up to date on business trends and technology and is an early adopter of powerful and creative ideas.
His experience covers a wide range of topics including data science, machine learning, deep learning, reinforcement learning, DevOps, software engineering, cloud computing, business & technology strategy, design & delivery of flipped/social learning experiences, blended learning curriculum design and delivery, and training consultancy.