In this section, we will introduce the content related to sentiment analysis, introducing the basic concepts, application scenarios, and related methods of sentiment analysis as well as conducting a practical experiment of sentiment analysis task based on PAI studio.
In this section, we will introduce the content related to sentiment analysis. Mainly introduce the basic concepts, application scenarios and related methods of sentiment analysis and conduct a practical experiment of sentiment analysis task based on PAI studio. In today's society, we can't live without social networks, shopping websites and video platforms, where we often browse information and make comments on something.
When we watch a movie, we tend to rate and write down our own comments on the movie review website. When we buy a product on a shopping website, we often give a feedback of our using experience to the seller. After browsing the news in the news website, we sometimes express our opinions over some social problems. All in all, when we use these platforms, we are generating a huge amount of information every day. And one of the things that all of these platforms has in common is that they have large amounts of user generated data.
The data can be categorized from several aspects. First, it is categorized according to the data format, the most common is text, in addition, it also includes voice, images, emojis, and so on, which are also widely used in our daily life. Secondly, is categorized according to its subjectivity and objectivity. Objective data mainly refers to the objective description of people, things and events with a lighter emotional tendency and subjective data has a strong emotional tendency towards the viewpoint, opinion, attitude and position of something.
For example, the comments on shopping websites and movie review websites often contain their own subjective feelings about the or the movie. These review data are often very valuable to businesses. Only when they received feedback from users, can they improve their products and services, however, with the development of social networks, the amount of data generated by users is very large, and the efficiency of analyzing the data manually is very low, therefore, we can use sentiment analysis technology with the corresponding algorithm to analyze these data and then develop some valuable applications.
Although there are many kinds of user generated data, the most common is text data, and we will only focus on text data analysis here. How to define text sentiment analysis is the process of using natural language processing and text mining technology to analyze, process and extract the subjective text with emotional color. The research of text sentiment analysis covers many domains, such as natural language processing, information extraction, information retrieval, text mining, ontology, etc, which has attracted the attention of many scholars and research institutions.
In recent years, it continues to become one of the hardest issues in the field of natural language processing, and text mining. Now, let's make a classification on the text sentiment analysis techniques. First, we can classify according to the granularity of analysis from coarse-grained to fine-grained, it can be divided into discourse level, sentence level, and word or phrase level. We can choose different granularity sentiment analysis methods, according to different tasks.
For movie reviews, we tend to care about the audience overall feeling and rating of the movie. So we can use discourse level sentiment analysis method to analyze movie reviews. As for take out reviews, a piece of review often involves multiple aspects such as food tastes, delivery speed, service attitude, etc, which requires analysis of each sentence in the review to obtain a more fine-grained emotional tendency. Second, we can classify the text according to the type of text to be processed. Common examples include product reviews and news reviews. The purpose of sentiment analysis of product reviews is to improve the quality of products or services through user's feedback, while the purpose of news review analysis is to grasp public opinion and become the basis for policy formulation or improvement.
Finally, we can also classify by the type of task, including sentiment classification, sentiment retrieval, and sentiment extraction. For certain tasks, if only to judge the emotional orientation of a piece of text, it can be regarded as a sentiment classification task, finding an opinion on something from large amounts of text is an emotional retrieval task, and if we want to extract elements such as viewpoint holder and evaluation object from sentences, it's an information extraction task.
Next, we will introduce several methods of sentiment analysis. One, is an approach based on sentiment dictionary. Sentiment dictionaries are generally generated on the basis of existing electronic dictionaries, which not only contain a large number of emotional words with emotional tendencies, but also contain the intensity of emotional words. Among them, positive emotions are labeled with positive numbers while negative emotions are marked with negative numbers. The absolute value of the label reflects the degree of positive or negative.
The process of sentiment analysis of a piece of text using sentiment dictionary is shown in the figure below. First, we perform word segmentation on the original text and then compare the results with the words in the sentiment dictionary, add up the labels of each emotion word in the text. If the final result is positive, the sentiment of the text is positive, otherwise the sentiment of the text is negative. The advantage of using a sentiment dictionary is that it's very versatile and allows for sentiment analysis of different types of text. At the same time, it also has some disadvantages such as the construction of a dictionary itself is time consuming and laborious. And the meaning of emotional words may change in different contexts, such as some scenes with an ironic tone, as a result, the accuracy of sentiment analysis not very high.
The second approach is based on traditional machine learning. As shown in the figure, this is the general process of sentiment analysis using traditional machine learning methods. This is a supervised method. First, we need to prepare some texts labeled with emotional polarity as a training set and use some tools to segment these texts. In traditional machine learning, we have to manually extract features from the input data. We take the segmented worlds as features and remove some words in relevant for expressing emotions in the process of feature selection, to reduce the dimension of feature. Since the machine learning model can not process text directly, after selecting the features we need to vectorize the features such as using one hot coding for text representation.
Next, we send the parameters and train the vectorized training set with machine learning classification model,. Common machine learning classifiers include Logistic Regression, Support Factor Machine, Naive Bayes, etc. After the model training is completed, we input the text to be classified into the model and then receive the results of text sentiment classification. However, traditional machine learning methods also face many challenges.
First, text is then unstructured data. Whereas classical machine learning models can only deal with structured data such as vectors, text vectorization result directly affects the classification effect. Second in traditional machine learning, features need to be extracted manually, and a piece of text often involves several different topics. So how to construct appropriate features is often laborious and inefficient. Finally, the words of the text are often connected with each other while the traditional text representation methods such as the bag-of-words model and the vector space model often ignore the connections between the text context and fail to represent the semantic information, so we need a backup approach, which is deep learning.
With the development of deep learning, people find it is a very suitable method for processing text information. The first is to text presentation . Traditional text representation methods, such as the bag-of-words model have very high feature dimensions and high sparsity and the feature expression ability is very weak. So the newer network is not good at processing such data. Deep learning solves the problem of large scale text classification, and the most important problem it solves is text representation.
The word embedding method can represent each word as fixed, dimensional, and dense real number vector, which avoids the influence of different text length on the feature dimension and has a strong feature expression ability. At the same time, the deep learning model can automatically extract features avoiding the trouble of manual feature selection.
Finally, deep learning models can take the connections between words and contextual information into account, which is a big step forward in machine understanding of text. Let's take a look at some deep learning methods that can be applied to text. The first is Word2Vec, which through the training of a large number of, maps each word into a vector place or fixed dimension and realizes the semantic representation of words. Doc2Vec generates a vector representation of the document on the basis of Word2Vec. The third is RNN, the full name of RNN is Recurrent Neural Network, which is a kind of neural network that can process serialized data, and text is a kind of serialized data itself.
Using RNN can well express context information. LSTM is a special kind of RNN, which can learn long-term dependence in long text processing, and solve gradient explosion and gradient disappearance problems in training. These methods have been introduced in detail in previous lectures, so we won't focus on them here. After introducing the related methods of text sentiment analysis, let's introduce some of its applications.
Several common applications of text sentiment analysis are shown in the table. The first is commodity review analysis. After we buy something on an e-commerce platform, we leave some comments about the product. For sellers, these comments are very valuable. Through the analysis of commodity comments, sellers can understand customers' satisfaction levels with the goods and then adjust their business strategies accordingly. The following is comments of movies or TV series. These comments reflect the audience satisfaction and preference for the show so that the producers can adjust the plot and schedule accordingly.
Finally, then analysis of public opinion, by analyzing public opinions on social events and policies, government departments can use this information to better respond to citizens' needs, due to the large number of comments generated by the platform, manual analysis is inefficient, which requires some advanced technology.
Alibaba Cloud, founded in 2009, is a global leader in cloud computing and artificial intelligence, providing services to thousands of enterprises, developers, and governments organizations in more than 200 countries and regions. Committed to the success of its customers, Alibaba Cloud provides reliable and secure cloud computing and data processing capabilities as a part of its online solutions.