The course is part of this learning path
This course covers product recommendation using PAI. First, we'll look at the basic concepts and applications of the recommendation system, focusing on the recommendation of products on the e-commerce platform. Then you'll follow along with an example product recommendation setup based on a collaborative filtering algorithm and then we'll conduct the actual operation in a practical demo.
In the experiment, we used the collaborative filtering algorithm to achieve a simple product recommendation task. First of all, let's introduce the basic concept of collaborative filtering algorithm. It's a commonly used recommendation algorithm based on the mining of users' historical behavior data. It finds users' shopping preferences and recommends products that users may be interested in. Collaborative filtering algorithm is divided into two situations, item-based collaborative filtering and user-based collaborative filtering.
The two cases are shown in the figure below. The image on the left is the item-based approach. When a person buys an item and another item is similar to it, the user's recommended another item. The figure on the right is the user-based approach. When the person buys the product and another person shares the similar shopping preference with him, the product is recommended to another user. The item-based collaborative filtering algorithm is used in our experiment.
We assume that a user has some browsing behaviors and purchased some products in November. We can calculate the correlation between the products based on the user's behaviors in November and recommend the possible products for the user to buy in December. By comparing the recommendation results generated by the model with the actual purchases made by users in December, we can evaluate the accuracy of recommendation.
For example, if user A buys item A in November and item A is strongly related to item B, then the system will recommend item B to user A in December. And we can judge whether the recommendation is hit or not according to the user's real purchase behavior in December . Next, we will introduce the specific methods to calculate the correlation of items. One of the most commonly used methods is the method of Jaccard similarity coefficient. The Jaccard similarity coefficient of two sets, A and B, is equal to the ratio of the number of elements in the intersection sets of the two sets to the number of elements in the union set. For example, the table shows how users A, B, and C purchase items A, B, C, and D.
Now, we want to figure out the similarity between item A and item B. Item A is only purchased by user A so the set of item A includes only user A. Item B is purchased by users A and B so the set of item B consists of two elements, user A and user B. By calculating the formula, we can get the similarity of any two items in the table. Now, we introduced the dataset of the experiment. The dataset is from Ali cloud Big Data platform, and contains the behavior data of multiple users in November and December.
Datasets contains four fields, the first two are the ID of the user and the item. And the third is the purchase behavior of the user. There are four kinds of behaviors. Zero means clicking, one means purchasing, two means collecting, and three means adding to the shopping cart. The fourth is the date when the user made the purchase, including a number of specific dates in November and December. Here, we illustrate the contents of dataset. The first line indicates that user 100 collected the item with ID 3763048 on November 25th. And the seventh line indicates that the user purchased the item with ID 1603476 on the same day.
Before the experiment, we need to upload the datasets, enter the pie studio working interface, create a table structure data as described above, enter the column name and data type, and upload the data from local path. Now, we can drag and drop the associated visual components to build the experiment. The connection of the components is shown in the figure on the left. Now, we will divide the whole experiment into three parts and introduce the specific functions of each part. The function of the first part is to generate a list of recommendations based on association rules.
The input of this part is the user's shopping behavior data in November which is first obtained through SQL script and then calculated for each item with the highest similarity through the collaborative filtering component. The following table is the output of the collaborative filtering component with the item ID in the first column. The item ID of the most relevant item in the second column to the left of the column, and the similarity coefficient to the right of the column.
The second part is the real shopping behavior of users in December. The purpose of this part is to provide real purchasing data to compare with the recommended results. The row of the join 2 component is to merge the recommended results with the real results and send them into the third part. The third part is to count recommendations and hits. Full-table statistic-1 shows the recommendation list generated based on the purchasing data of November. Full-table statistics-2 shows the hit recommendations based on the recommendations generated by algorithm and the actual situation.
The core of the whole experiment is the collaborative filtering algorithm component. Now, let's take a look at the parameters that can be adjusted. The similarity type is the specific way to calculate the similarity of goods. Here, we used the Jaccard similarity, or we can choose between WP Cosine and ACI Cosine. Top N is the maximum number of similar items that can be reserved in output table, and the default value is one. Calculation method is the method used to calculate the payload when an item of a user appears multiple times.
The valid values are add, multiply, minimum, and max. Minimum and maximum items mean that if the number of items of a user is less or larger than the value of this parameter, the behavior of the user is ignored. Smoothing factor and weighting co-efficient are valid only when similarly type is set to ACI Cosine. Finally, let's look at the results of the experiment. The left side is the output of the full-table statistics-1 component. The first column is the user's ID. The second column is the items the user purchased in November. And the third column is the items recommended by the collaborative filtering algorithm to the user that have the highest similarity to the items already purchased.
On the right is the output of full-table statistics-2 which compares the items recommended with the items that the user actually purchased in December, and shows all the hit recommendations. The above is the introduction and experiments of the product recommendation task. So, let's do this experiment by hand. This is a practical demonstration for commodity recommendation by using PAI console. First of all, you need to open the website of PAI console and then click model training and select studio modeling visualization. The right side is your project list.
At first time, you should create a project if you don't have any project. For creating, you should click create button, fill in these information, select without GPU, and click okay. When you already created a project, you could click machine learning which your project matches. And now you come in a new website, which name is machine learning platform for AI. Similar to previous experiments, you need to create a new project, firstly. In this experiment, you need to clone a preset templates by PAI and modify some information in it. You should click on the icon of home to the left most column, then find the template which name is recommended algorithms product recommendation, click create button and type your experiment's name and description in the pop-up dialog. Choose where your project save to in your own folder. When everything is okay, click the okay button. And now, we should take a look at the form of the data.
There are two datasets, name DEC and NOV, which means the time of these data. As you can see, there are four columns in every dataset. The data in the first column and second column are some numbers and only 0, 1, 2 and 3 can be seen in the third column. Obviously, the last column represents the time and these two datasets are identical in format. Look at this table. in this table, you can easily know what the meaning of every column. To make everything easier, they are field name, meaning, type, and description, and each field corresponds to each column. The first column means user's ID and the type of this column is not ind but string.
The data in the second column represent the ID of item and it must be string. The third one shows the behavior of the user and there are four kinds of behavior, 1, 2, 3, and 4. Represent clicking the item, purchasing the item, collecting the item, and adding the items to the shopping carts respectively. And as we mentioned before, the last column means the date of the behavior and all the data in it are string. The next step we need to do is to upload these two datasets. You should go to the graphical interface of data by clicking on the data source icon, then click the create table.
In this interface, you should type the name and the lifecycle of your table. The falling part will show how fill in the schema. In order to be more convenient for the next part, you better name it according to the previous form to fill in the content. And don't forget every column is string. When clicked the next button, you should modify the road delimiter and column delimiter. The road delimiter is /n and the column delimiter is comma. After modification, you should upload the dataset with the button of select file, and you can see the content of the dataset which you uploaded.
In the end, click okay button. And if there's not any error message popping up, this dataset is uploaded successfully. What's next? You need to upload the other dataset. Click the create table button, type your table name, and lifecycle, and do other things as well. And finally, upload the correct file. Now, all these two datasets are uploaded successfully. After uploading datasets, let's see the logic of the experiment. We split all the logic into three parts and we will use the red box to represent the current logic. The function of the first part is generating a list of recommendations based on association rules.
So, we used a shopping behavior in November as the data source. And the core of this part is the collaborative filtering. This component is used to calculate the item most similar to each item so as to analyze and obtain multiple items that each user may purchase at the same time. And in the second part, we import the real data in December. And when the model can recommend for the new data, in the last part, we count recommendations and hits. And in the left, it will show the list of recommendations generated based on shopping behaviors in November, and the other parts shows hit recommendations.
Back to PAI, you need to modify some attributes in this experiment. You should click the top left mode and type the name of November dataset which you already uploaded. And the dataset in the top right note should be modified with the name of the December dataset. When all is done, click from button. When finished, you can check all the information in these two notes by clicking with the ride and selecting field data. And you can see, of the 100 predictions, 39 are correct. The practical demonstration for commodity recommendation by using PAI console is finished.
Alibaba Cloud, founded in 2009, is a global leader in cloud computing and artificial intelligence, providing services to thousands of enterprises, developers, and governments organizations in more than 200 countries and regions. Committed to the success of its customers, Alibaba Cloud provides reliable and secure cloud computing and data processing capabilities as a part of its online solutions.