Building a Recommendation Engine on Azure
The course is part of this learning path
Building a Recommendation Engine on Azure is a course designed for teams interested in using artificial intelligence to add product recommendations to their websites.
A product recommendation engine is a valuable feature that helps drive sales on e-commerce sites. In this course, you will learn the essentials of building, deploying, and testing a recommendation engine on Microsoft Azure. You will also build skills to fine-tune a recommendation model and evaluate its effectiveness.
This course is made up of five lectures covering deploying, testing, configuring, evaluating models, and making API requests. This is an intermediate level course, and prior Azure and API experience is recommended.
- Deploy a recommendation engine on Microsoft Azure
- Test and evaluate different recommendation models
- Make API calls to the Microsoft Product Recommendations Solution
- People who are interested in artificial intelligence services on Microsoft Azure, especially recommendation engines
- Experience using Microsoft Azure
- Experience using APIs
Related Training Content
The GitHub repository for this course is at https://github.com/cloudacademy/azure-recommendation-engine.
- [Instructor] Remember all those fields we left with the default values when we trained the model? Now we're going to see what they do. Click on Models to get back to training page. Then click Train New Model again. Remember when we uploaded the demoCatalog.csv file to Azure Storage? We didn't use it for our first training run, but let's have a look at it now. This is the catalog with more details about the items. Each record contains an item ID, name, and category.
These are mandatory fields. Unfortunately, a catalog is pretty much useless if it only contains these fields. To prove it, type demoCatalog.csv for the catalog file relative path. Now train it. When it's done, click the Score button. Then type DQF-00248 again. Then get the recommendations. It came back with the same recommendations as last time and almost identical scores. Since the scores are slightly different from model to model even when you train them with identical parameters, this means that adding the catalog made absolutely no difference. So why would you bother to add a catalog? Well, if you include some additional fields in a catalog, it can be quite useful.
Suppose you're a bookseller, and you've added a new Stephen King novel to your catalog. Initially, the algorithm will never recommend this book because no one has purchased it or even clicked on it yet. So it doesn't know which books are most similar to it. But there are some obvious features of this book that should lead it to being recommended even without any transaction history. For example, people who bought other Stephen King books would likely be interested in this one too.
It might also be a good recommendation for people who like horror books since Stephen King is a popular horror writer. It's possible to add this sort of information to your catalog as features. Here's some example data from Microsoft that includes three extra columns at the end. These give the author, the publisher, and the year the book was published. I would also add the genre, such as horror or science fiction because that would make it much easier to come up with recommendations.
You can add up to 20 features to a catalog. If you put features in your catalog, then you'll also need to set 'Enable Cold Item Placement' to true. 'Cold Item' means an item that has little or no transaction history. If you leave this option as false, then it will ignore the features in the catalog and won't use them to recommend cold items. Then you have to decide whether to 'Enable Cold to Cold Recommendations'.
If you leave this as false, then it will only find relationships between cold and warm items, and it won't recommend a cold item when a customer is looking at another cold item. The rest of the parameters are different ways of tweaking the model to try to get better recommendations. For example, the support threshold is how many times two items need to occur together in transactions before their relationship is added to the model. The default is six times.
If you set it to, say, one, then even if two items have been purchased together only one time, then the model will still consider making a recommendation based on that, which is probably too aggressive. When 'Co-occurrence Unit' is set to 'User', it means that if a user bought item one yesterday and item two 10 days before that, it will still consider them to have been purchased together. The alternative is 'Timestamp' which means that the items had to be purchased at exactly the same time in order to be considered as together.
The 'Decay Period in Days' is to tell the model not to count older transactions as much. The default is that any transactions that are older than 30 days will only be counted half as much as more recent ones. The 'User Affinity' setting only applies to personalized recommendations. If it's set to true, then the model will apply weights to different transaction types. For example, it will count a purchase as four times as important as a click.
This setting will also tell it whether or not to reduce the importance of older transactions. 'Similarity Function' is an interesting one. If it's set to Co-occurrence, then it will tend to recommend the most popular items. Why? Well, suppose you have a site that sells movies. Lots of your customers are going to buy the latest Star Wars movie, regardless of what other movies they buy. If your model just looks at the total number of co-occurrences, then it might recommend the Star Wars movie no matter which movie a customer is looking at.
That would be a pretty lousy recommendation engine. If you set this option to 'Lift', then it will dramatically reduce the score for generally popular items. The 'Jaccard' option is a compromise between the two approaches. It reduces the score for popular items, but not as much as 'Lift' does. That's the default setting. On the other hand, if the model doesn't come up with enough recommendations in some cases, then you can 'Enable Backfilling' which will add popular items to the list of recommendations.
This gives you the best of both worlds because you can suppress popular items for the main recommendations by using Jaccard but add them to the list when necessary. In the next lesson, I'll show you how to predict the true impact of these parameters.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).