Deploying a Model
Start course

Machine learning is a notoriously complex subject that usually requires a great deal of advanced math and software development skills. That’s why it’s so amazing that Azure Machine Learning lets you train and deploy machine learning models without any coding, using a drag-and-drop interface. With this web-based software, you can create applications for predicting everything from customer churn rates to image classifications to compelling product recommendations.

In this course, you will learn the basic concepts of machine learning and then follow hands-on examples of choosing an algorithm, running data through a model, and deploying a trained model as a predictive web service.

Learning Objectives

  • Create an Azure Machine Learning workspace
  • Train a machine learning model using the drag-and-drop interface
  • Deploy a trained model to make predictions based on new data

Intended Audience

  • Anyone who is interested in machine learning


  • General technical knowledge
  • A Microsoft Azure account is recommended (sign up for a free trial at if you don’t have an account)


The GitHub repository for this course is at


So far we’ve only been training models. Now we’re going to deploy a model so you can use it to make predictions on new data that comes in.

If you don’t see the Create inference pipeline button up here, then you may need to refresh your browser. Click on it. It gives you a choice of creating a real-time inference pipeline or a batch inference pipeline. The names are pretty self-explanatory. With a real-time pipeline, you submit a new row of data, such as the features of a particular automobile in this case, and it will respond with a prediction in real-time. With a batch pipeline, you can submit lots of new rows of data, and it will take a while to process them, but it will return the predictions for all of them at the same time. It usually costs less to use a batch pipeline than a real-time pipeline if you need to get lots of predictions.

We’ll create a real-time pipeline. It automatically makes some changes to our graph. It adds a “Web Service Input” module at the top because we need to allow data to be input from the web. It also adds a “Web Service Output” module at the bottom because it needs to return its predictions to the person or program that requested them.

It also removes the modules that were needed for training but aren’t needed for a predictive service. For example, it doesn’t need to split the data into training and test datasets anymore. It also removes the Decision Forest and Train Model modules and replaces them with the trained model we got from running the experiment. Our data cleaning module looks different now, too, but it’s still performing the same function.

Now that the new graph is ready, we have to click Submit again because the graph is different from what it was before. Select our auto-price experiment, and click Submit.

While that’s running, we’ll create a deployment target, which is a compute cluster that will run the inference pipeline. Click on Compute. Then go to the Inference clusters tab. Click the Create new inference cluster button. Let’s call it inference1. Select a region close to you.

For the virtual machine size, choose Standard_A2. This is a relatively small VM, and it wouldn’t be suitable for production use unless we had at least 6 of these VMs in the cluster, but we’re going to change the cluster purpose to Dev-test. This will allow us to use a low-powered cluster. In fact, we’re only going to have one VM in the cluster, so we’ll leave the number of nodes at 1. It takes quite a while to create, so I’ll fast-forward.

You’ll probably need to click Refresh to see when it’s done. Now go back to the Designer. Then click on the real-time inference pipeline. Now click the Deploy button. Select the inference cluster and click Deploy. This can take a while as well. Click on Endpoints, and if it’s not there, click Refresh periodically until the endpoint appears. 

Once it’s there, click on it. The deployment state says Transitioning. And that means that the service is in the process of deployment, so I’ll have to wait a while and keep refreshing my browser as well.

Okay, the deployment state is Healthy, so it’s ready. Now go to the Test tab. It has a field for each of the columns in the automobile dataset. You can either fill them in with new data or use the data that’s been automatically filled in. It simply filled in the fields with the values from the first row in the dataset.

Isn’t it weird that you need to fill in the price, too? After all, the whole point of this is that it’s supposed to predict the price, so it doesn’t make sense for us to enter it. It doesn’t actually use it, though. You can put any value in there, and it’ll just ignore it.

Now click the Test button. It comes back really quickly. It responds with all of the values that we submitted plus a scored label at the bottom. That’s its prediction for the price of the automobile. 

Once you’re satisfied that the test works, then you can find out how to call this endpoint from your applications by going to the Consume tab. It gives you the URL of the endpoint and a couple of different ways to authenticate. You can use either a key or a token. It also gives you some sample code in a few different languages.

And that’s it for deploying a model.

About the Author
Learning Paths

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).