Alibaba Cloud PAI
The course is part of this learning path
This course explores Alibaba Cloud PAI, aka Platform for Artificial Intelligence. First, we'll look at the concepts and architecture of PAI. Second, the characteristics and advantages of PAI. Third, we will introduce several main services provided by PAI, and last, you will follow along with a practical demonstration from the Alibaba Cloud platform showing you how to carry out some basic operations in PAI.
Finally, let's look at the basic operations of PAI focusing on project creation, data preparation, and some configuration actions for your first use of the PAI platform. First, let's take a look at the basic operating process of PAI. First, we need to create a project. After the creation is completed. We prepare the data and select the corresponding data uploading method, according to the task and data types.
The next step is algorithm modeling. In PAI-Studio, we build experiments by dragging and dropping components. And multiple experiments can be built in a single project. After the data and parameters are prepared, we run the experiment and obtain the results. We can choose various evaluation indexes to evaluate the quality of the results. This is the complete process of experimenting with PAI. Let's start by learning how to create a project.
First, we log in our Ali Cloud account, and enter the PAI console. In the left sidebar of the console, you can choose from various PAI services. To create a PAI-Studio project, first select studio modeling visualization from the model training tab in the left side navigation page. If it is your first time to use PAI-Studio, you need to purchase the service in the user's region. You can choose two modes, pay first and pay later, according to your actual needs.
After the purchase is completed, it is as the picture that's played in the virtual modeling section. You can see the number of projects created, the number of experiments, and the number of experiments running. First, we click create project to create a project, fill in the project names, alias, and description where the project name is required. At the bottom of the page is the open GPU column. You can choose how you want to use your GPU resources, and you can choose by usage or without GPU.
If you are using PAI-Studio for the first time, you first need to authenticate with real name, then you need to create the access key, and open the MaxCompute big data service. The access key is essential for access of your Ali Cloud account, which needs to be used for authentication when calling Ali Cloud API. As for MaxCompute, as we mentioned earlier, it's a big data computing service from Ali Cloud that provides support for analyzing and processing huge amounts of data. After these conditions are met, we can start using PAI-Studio.
First, we create the access key, select access key management in the personal information interface at the top right corner of Ali Cloud home page, and click create access key in the page you enter. At this point, we need to receive and fill the phone verification code, and then successfully create the access key. The access key ID is used to identify the user, and the access key secret is used to verify the user's key, which must be kept secret. You can click to download the CSV file with your access key information, or just copy and save it. So let's open up the service of MaxCompute.
First, go to the product homepage of max compute, select the charging method you want, the region you are in, the type of specification, etc., and click buy now to purchase the service. What we choose here is pay as you go. The quantity of resources and the price are decided according to the actual use of the service. After we finished filling in authentication access key creation, and MaxCompute opening, go back to the project creation page, and enter the project name, alias, and description.
After clicking okay, we enter the main page of PAI-Studio visual modeling. These are some ready-made project templates, such as community recommendation, music recommendation, financial risk management, etc. You can directly use these templates to quickly create experiments. In the options in the left column, experiment is the interface for building experiments by dragging and dropping visual components. Notebook is an interactive programming interface where you can upload code files from your personal OSS buckets.
In data source, we can upload data for experiment. In components, we can choose from the large number of data processing machine learning components provided by PAI. We can store the models established in models and make them available at any time if needed. And we can do some general options in settings.
Next, let's look at how to prepare the data. There are two types of data storage supported by PAI. MaxCompute and object storage service, OSS. MaxCompute is used to store regular table structured data such as the one on the left for the general algorithm components such as traditional machine learning algorithms. In addition to storing structured data, OSS can also store unstructured data such as image, video, text, etc., which can be suitable for deep learning algorithm components.
Next, we will introduce the data preparation method of the two types of storage respectively. First, we will learn how to prepare MaxCompute table structure data. Find the data source in the left pane of the PAI-Studio interface. For data sets you might need later, you can save them in favorite tables for easy access. Public tables provide some common data sets for PAI that can be used in existing templates, such as EMNISThand written number recognition, breast cancer analysis datasets, etc. By clicking create table below, we can create our own data set.
Let's learn how to create our own dataset from the beginning. First, we need to enter the name of the table and its lifetime, which is how long it can be saved. We then click the plus sign to add columns, name each column of the table, and determine the data type for each column. Second, we need to upload the data from local The requirements for table data files are that you can choose only TXT or CSV formats, and the file size cannot exceed 20 megabyte. The row and column delimeters can be set below in the TXT file we used. The lines are separated by new line characters and columns are separated by commas. So we choose \n and comma as delimiters.
The table displayed is a preview of the data. Each row and each column has been successfully separated indicating that we have no problem preparing the table data. After successfully uploading the data file, we will use it in the experiment. In PAI-Studio, the data sets are also packaged as visual components that can be used by dragging and dropping. We can drag and drop the trained data set we just created onto the virtual interface. The white dot below a component indicates that it can be used as an input data source to connect to our other components.
Right click on the components and select view data to view the contents of the table as shown in the figure on the right. If the data volume is too large, only the first hundred rows are displayed. This is how MaxCompute table data is prepared. Now let's look at the method of OSS data preparation. If we want to upload images, texts, videos, and other unstructured data, we cannot upload it directly like table data. We need to register the data set first. Select data set manager from the data processing tab of the console to enter the data set management interface, click register dataset.
In the pop-up window, we enter the name of the dataset. Choose whether to create or import an existing dataset, and select the data type from image, text, and video. If you're registering a dataset for the first time, and you're storing it with OSS, you need to authorize OSS first. After the OSS authorization is completed, we can add the path of the dataset. If we are adding paths for the first time, we will be prompt for missing OSS bucket. What is a bucket? The original meaning is a container used for loading things. Here, bucket is a storage space for users to manage the stored objects. Datasets and code files prepared for the experiment can be stored in the bucket and read in the experiment. And the model generated by the experiment will also be saved here. Now let's learn how to create a bucket.
So let's go to the buckets page, and click create buckets. On the pop-up page, fill in the name, region, and class of your pocket. There are also some options below such as access control list, encryption method, real-time log query, scheduled backup, etc. You can select by yourself, or you can use the default option. Now we highlight access control list, which means the read and write permissions of the file in bucket.
Private means that only the owner and the authorized user of the bucket can read and write files in the bucket. And no one else can have the access to it. Public read means that anyone can read the files in the bucket, but only the owner and authorized user can write files to it. Public read/write means that anyone can read or write files in bucket. It should be noted that both public read and public read and write are subject to the possibility of data leakage and cost increases, which should be chosen carefully.
The newly created bucket is empty, and we need to upload files to it. We can click create folder to create our own folder, or we can click upload to upload the file directly. We can upload files to the current folder or select other folders. At the same time, we can select the access control list. You can choose to inherit the storage permissions of the bucket, or you can do other settings. Then we can click upload or drag and drop files to upload from local files.
At this point, we have finished creating a bucket and uploading files to the bucket. And now we have a bucket path. Now we can add the bucket path we just created to our path, click submit, and we can see that the dataset has been successfully registered, and it's displayed in the list.
Next, we will introduce how to prepare OSS data. After entering the interface of PAI-Studio, find the read file data in the components and drag it to the interface. With this component, we can import files stored in OSS bucket. On the right tab, we can select a data file from the OSS bucket as an input, and complete the preparation of the OSS data. Until now we have introduced the two ways of data preparation. Finally, we will introduce how to open GPU resources.
If model training requires GPU resources, such as deep learning tasks with PAI TensorFlow, we need to authorize the GPU resources first. One way to do this is to select without GPU or by usage from the open GPU dropdown list in the project list for PAI-Studio. Another way is to turn on deep learning in the settings section of the PAI-Studio interface. If you use GPU resources for training, you don't have to pay before using, just need to pay according to usage.
So let's review the main contents of this section. In the first two parts, we introduced the basic concepts, architecture, and features of the PAI platform, and compared developing by PAI with traditional development methods. In the third part, we introduced the four basic components of PAI and compared them with similar products. In the fourth part, we learned some basic operations and the beginning of using PAI, such as project creation, data preparation, and authorizations of some functions. This is the end of section one. Thank you for your listening.
Alibaba Cloud, founded in 2009, is a global leader in cloud computing and artificial intelligence, providing services to thousands of enterprises, developers, and governments organizations in more than 200 countries and regions. Committed to the success of its customers, Alibaba Cloud provides reliable and secure cloud computing and data processing capabilities as a part of its online solutions.