Introduction & Overview
Creating Custom Vision Resources and Models
Consuming and Exporting Models
This course explores the Azure Custom Vision service and how you can use it to create and customize vision recognition solutions. You'll get some background info on what the service is before looking at the various steps for creating image classification and object detection models, uploading and tagging images, and then training and deploying your models.
- Use the Custom Vision portal to create new Vision models
- Create both Classification and Object Detection models
- Upload and tag images according to your requirements
- Train and deploy the configured models
- Developers or architects who want to learn how to use Azure Custom Vision to tailor a vision recognition solution to their needs
To get the most out of this course, you should have:
- Basic Azure experience, at least with topics such as subscriptions and resource groups
- Basic knowledge of Azure Cognitive Services, especially vision-related services
- Some developer experience, including familiarity with terms such as REST API and SDK
Now that we already have our proper resources in place, it's time to cover the creation of the models in Azure Custom Vision. Regardless of whether you're creating a classification or object detection model, the process essentially consists of the following steps; create the project in Azure Custom Vision, upload and tag the images, train the model, evaluate and test the trained model and improve the predictions as needed, and finally, deploy the model to be used by your apps for prediction. Let's drill-down on each one of these steps.
The first step is to create the project on Custom Vision. For that, you need to provide the following information; the name of the project, as well as, optionally, a quick description of it, the Azure resource that you want to attach to this project, such as the one we just created in the last video, and finally, we have to make three decisions that might highly influence your model's response time, performance and capabilities; the project type, classification or object detection, the classification type, multilabel or multiclass, and the domain. I want to spend a bit more time on each of these definitions, so let's see them on individual slides.
Let's start with project and classification types. As you can probably remember from a previous slide, Custom Vision can do both classification and object detection, with object detection also giving the coordinates of each object found in the image, the where on top of the what. If you choose to build a classification model, however, you also need to decide if this will be a multilabel or multiclass model. With multiclass, every image belongs to just one class. You're probably asking something like, "Is this a picture of a cat, or of a dog?" With multilabel, each image can have multiple classes in it. You're more likely to be asking, "Do you have any cats or dogs in this picture?"
You also need to decide on the domain that your solution belongs to. A domain typically optimizes a model for a specific set of images. For example, if you're building a touristic app that should recognize the Eiffel Tower or the Empire State Building, your model will work more efficiently if you choose the landmark domain. There are several image classification models available to choose from.
The first is General, which is optimized for a broad range of image classification tasks. The General domain also has two alternatives, A1 and A2, depending on your requirements for training and response time, accuracy, and model complexity. The we have Food, for pictures of dishes, fruits, vegetables, and so on. We also have Landmarks, which you can use for both natural landmarks, such as Niagara Falls, and artificial ones, such as the Statue of Liberty. And then we have Retail, which is ideal for shopping catalogs and e-commerce websites.
Object detection has slightly different models to choose from. General, and the A1 alternative, for broad applications, Logo, for detecting logos and brands, and Products on Shelves, for detecting products for retail applications. Finally, you can also have compact domains. These domains tend to be smaller, and therefore easier to export to edge devices, such as an iPhone, Android or IoT device. But they tend to be a bit less precise than their non-compact counterparts. For the full and up-to-date list of domains available, you can visit this site in the Azure Documentation.
Now that your project is created and configured, the next step is to upload and tag the images. This is probably the least exciting part of working with Custom Vision, as you need to upload and tag the pictures manually. If it helps, try to keep in mind that you're doing this task so that you never have to do manual inspections again. I once heard of an engineer whose main job was to look, for several hours a day, at footage taken inside water pipes, trying to identify cracks and blockage. A Custom Vision project would definitely bring him some job satisfaction back.
The Custom Vision portal makes this process much easier, though. For example, the Smart Labeler feature can automatically generate suggested tags for your images. Sometimes, all you need to do is to accept the suggested tag. There's a small cost associated with this, but it's more than justifiable, given the productivity boost it offers. Keep in mind that this feature depends on a trained model, and can only identify tags that were created before the last training run. That being said, if you already have the data properly classified, for example, all images for each class are in separate folders, you can automate the whole thing through code, using the training API.
The coding route is especially relevant if you have a large number of images to tag during model training, or if you want the upload, tagging and training to be part of a DevOps strategy. For object detection projects, this might be a bit trickier, as you also need to create the bounding box for the detected objects, as we'll see on the demo. The process of uploading and tagging images might be a bit tedious, especially for larger projects, but how well you do this task is crucial to the accuracy of the model.
Here are some recommendations for developing a great Custom Vision model. You can create these models with as little as 10 images. However, you should ideally upload at least 50 pictures per tag. The more of them, the better. Make sure they are as diverse as possible in terms of background, size, lighting, angle, style, and so on. This will help the model differentiate between what is really the object or class you want to detect, and what is irrelevant information.
It's also important to keep a good balance between tags. For example, if you're uploading 50 images of dogs, you should upload roughly 50 images of cats as well. Ideally, you should aim for a maximum of 1:2 ratio between your lowest and highest tagged classes. Custom Vision might give you a warning if unbalanced data is detected. You might also benefit from adding negative images that do not belong to any label, to improve the results even more, for example, the picture of a bird.
These images are labeled with a special tag called Negative. Note that negative images are only available for classification*, not object detection projects.* Finally, the Custom Vision portal has an interface that allows you to fix incorrect predictions made in the past. And you should periodically do so to increase your model's accuracy. All these techniques are advisable to prevent something called overfitting. When overfitting happens, your model will perform well on the training data, but poorly in production, when real data comes in. If you're experiencing low accuracy on your predictions, make sure you revisit these recommendations for possible improvements.
Finally, now that we have created and configured the project, and uploaded and tagged the images, it's time to train the model. Training the model is actually quite simple, you just need to click on the Train button at the top, to perform the training process. As you can already imagine at this point, you can also perform the training process by using the REST API or the SDKs. There are two options when training a model. The first one is Quick Training, which is ideal during development and quick tests, and generally takes a very small amount of time. The other option is Advanced Training.
You see, the general rule for AI models is that, the more images you have, and the longer you spend on training, the more accurate your model will be. Because model training is relatively expensive, the Advanced Training option allows you to set up a maximum budget, in hours, that you want to allow Custom Vision to spend on it. If the operation finishes earlier, you're only going to be charged for the number of hours actually spent. You can optionally request an email notification when the process is finished.
Every time that you train a model, you create a new iteration. Iterations are just versions of your trained models, and you can switch between them to track your progress over time. This helps you understand how well your model is performing as you implement new changes. There are a few reasons why you might want to periodically re-train your model.
First, you need to re-train your model every time a new label is added. For example, if now you also want to identify the Taj Mahal, you add and tag new pictures with this new landmark. But it won't be recognized by the model until you create a new iteration. Also, as I have mentioned on the previous slide, once your model goes into production and new pictures are sent by the application for prediction, these images are added to Custom Vision for model improvement. Once you confirm or correct these new predictions, you need to re-train the model so that the new predictions help improve the accuracy.
Let's now see how this whole process works in a quick demo.
Emilio Melo has been involved in IT projects in over 15 countries, with roles ranging across support, consultancy, teaching, project and department management, and sales—mostly focused on Microsoft software. After 15 years of on-premises experience in infrastructure, data, and collaboration, he became fascinated by Cloud technologies and the incredible transformation potential it brings. His passion outside work is to travel and discover the wonderful things this world has to offer.