Model Evaluation
Start course

This course explores the Azure Custom Vision service and how you can use it to create and customize vision recognition solutions. You'll get some background info on what the service is before looking at the various steps for creating image classification and object detection models, uploading and tagging images, and then training and deploying your models.

Learning Objectives

  • Use the Custom Vision portal to create new Vision models
  • Create both Classification and Object Detection models
  • Upload and tag images according to your requirements
  • Train and deploy the configured models

Intended Audience

  • Developers or architects who want to learn how to use Azure Custom Vision to tailor a vision recognition solution to their needs


To get the most out of this course, you should have:

  • Basic Azure experience, at least with topics such as subscriptions and resource groups
  • Basic knowledge of Azure Cognitive Services, especially vision-related services
  • Some developer experience, including familiarity with terms such as REST API and SDK

Now that we have created and configured our model, uploaded and tagged our pictures, and successfully trained an iteration, it's time to check how well the model is predicting our landmarks. After watching the last demo, you might be wondering, "How can Custom Vision give us those performance metrics for the model without us telling the service which pictures were correct?" Well, we actually did tell you, and that's due to the process used by machine learning systems to evaluate performance.

When you train ML models, the algorithm will split the data in two sets. First, we have the training set. This is the larger dataset, with anywhere between 60 and 90% of the data, and will be used by the ML system to train the model. For example, to differentiate between the Empire State Building and the Eiffel Tower. Then, we have the test set. This is the remaining dataset, and will be used to evaluate the model that was trained.

For example, if the trained model believes that the picture from a test set is from the Empire State Building, but it's actually from the Eiffel Tower, this is considered a false positive, and will influence the performance metrics. The combination of true positives and negatives, and false positives and negatives is something called the confusion matrix, and it's quite common in ML systems. However, as Custom Vision is meant for developers with less data science experience, it uses slightly simpler metrics, precision, recall, and mean average precision. Let's see what these metrics mean.

Let's suppose that the test of this model had 10 pictures, five of the Empire State Building, and five of the Eiffel Tower. So let's assume that the model guessed that the first six pictures are from the Empire State Building, even though one of them is actually from the Eiffel Tower, so it was incorrectly predicted. And the other four were identified correctly as the Eiffel Tower. The precision metric means what percentage of a class prediction did the model correctly identified?

For example, the model predicted that six images were from the Empire State Building, but only five of them actually were. Therefore, precision will be five divided by six, or approximately 83.3%. The recall metrics means what percentage of the class predictions made by the model were correct? In this example, there are five images of the Empire State Building, and the model identified all of them correctly which means that the recall for the Empire State Building is 100%. The recall for the Eiffel Tower is 80%, as it identified 4 out of 5 Eiffel Tower pictures, which gives an overall recall for this model of 90%. Both model-level and class-level metrics appears on the performance page, so you can understand if there's a specific class bringing your model performance down. Finally, mean average precision is an overall measure of performance that computes precision and recall at different thresholds into a single, easy-to-read value.

Another important metric that is worth mentioning is the probability threshold. Remember that for all Custom Vision projects, you have a probability score which tells how sure the model is that it got the correct class. The default is 50%, which means that if the model is at least 50% confident that the picture belongs to the Empire State Building, it will mark that picture as the Empire State Building class. But you can slide the probability threshold higher or lower, according to your needs.

Increasing the probability threshold will tend to increase the precision of the model. That makes sense, right? After all, that means that the model needs to be much more confident about the prediction before it considers that it's right. However, this option tends to decrease the recall. Since it's only tagging an image when it's highly certain of it, many images will tend to go undetected. Decreasing the probability threshold will go in the opposite direction by decreasing the precision of the model and increasing the recall. 

How are you going to use this threshold will be highly dependent on the needs of your application. Let's use an extreme example, and suppose that the vision solution will be used for cancer screening. Do you prefer to use a higher probability threshold and risk many possible tumors going undetected, or a lower probability threshold that indicate to the doctor that further exams might be needed? That's not a difficult call in this case, right?

Once you evaluated the model and you're happy with the results, it's time to publish it. Publishing a model in Custom Vision is very easy. You just need to click on the Publish button. Once you do that, a new dialog box will appear, asking you to choose a name for the model, the default is IterationX, where X is the number of the latest version of the model, and select a prediction resource in Azure, the resource that you have configured on our first demo.

Let's jump on a quick demo to see how these concepts work in practice.

About the Author

Emilio Melo has been involved in IT projects in over 15 countries, with roles ranging across support, consultancy, teaching, project and department management, and sales—mostly focused on Microsoft software. After 15 years of on-premises experience in infrastructure, data, and collaboration, he became fascinated by Cloud technologies and the incredible transformation potential it brings. His passion outside work is to travel and discover the wonderful things this world has to offer.