This course covers how to use the text analytics features in Azure to detect language, as well as how to retrieve and process key phrases, entities, and sentiment from a text. We'll provide you with a practical understanding of these features thanks to a real-life demonstration on the Azure platform.
Learning Objectives
- Understand what text analytics is and its use cases
- Create the Azure resources for carrying out text analytics
- Use Azure to retrieve and process phrases, entities, and sentiment from a text
Intended Audience
This course is meant for developers or architects who would like to know more about how to use the text analytics capabilities of Azure Cognitive Service for Language to understand and process text content.
Prerequisites
To get the most out of this course, you should have basic Azure experience, knowledge of Azure Cognitive Services, and some developer experience, including familiarity with terms such as REST API and SDK.
Let’s start with a general overview of what Text Analytics is, and how it can help you understand written language.
So what is Text Analytics? Well, every day, there’s a substantial amount of data being generated: on emails, social posts, online reviews, and much more. And, although a faster internet has increased the popularity of video content, a lot of this information is still generated as text.
All this data proliferation makes it increasingly more difficult to process it manually. Text Analytics, as one of the Microsoft Cognitive Services related to Natural Language Processing, allows you to extract meaningful insights from text-based data. For example, you could use Text Analytics to measure how well your company is performing across several social networks.
And the best part is that you don’t need to understand anything about AI or ML. All you need to do is to call the service, using either a REST API or an SDK on one of several languages supported, including Python, .NET, and Java.
But how can you work with Text Analytics? Well, the landscape of Cognitive Services is constantly evolving, and Microsoft often creates, deprecates or merges services according to market demands. So there has been a recent development in Azure that will influence your choices here:
- Until very recently, Text Analytics was a dedicated Service, that you could create on the Azure Portal. If you still have a Text Analytics resource in your Azure subscription, you can use that option for the time being, although eventually, you might need to migrate to one of the other options below
- In November 2021, Microsoft unified the Text Analytics, QnA Maker, and LUIS services into one, called Azure Cognitive Service for Language. If you’re trying to create a new dedicated language service in the Azure Portal, this is now your default option
- That being said, some companies prefer to consolidate ALL Cognitive Services – not only Language, but also Vision, Search, and Speech – in a single Azure endpoint. That makes billing and resource management easier, although it’s more difficult to break down your costs. If your company prefers that route, you can create a single Cognitive Services resource in the Portal instead
As a consolidated service for all things related to language, Azure Cognitive Service for Language has been greatly expanded in terms of functionalities – including healthcare-specific analysis, detection of Personally Identifiable Information such as email and phone number, and a series of new features still in preview. However, for this course, we will focus on the classic capabilities of text analytics, which are:
- Language Detection
- Key Phrase Extraction
- Sentiment Analysis
- Named Entity Recognition
- And Entity Linking
Let’s see each one of them in more detail!
The first (and simplest) capability of Text Analytics is Language Detection:
- This allows you to determine in which language the text is written, supporting over 100 languages, variants, and dialects. This is useful, for example, for multilingual chatbots or message-based support centers, where you might need to transfer to a different agent or bot based on the customer language
- The result will be a JSON array with several elements – one for each sentence you have submitted. Each element will contain the language detected, its corresponding ISO code (for example, EN for English), and the confidence level on the results – the closer to 1, the more confident the service is about the prediction
- There might be cases where the text is a mix of different languages. If that’s the case, Text Analytics will return the language with the largest representation (based on the number of characters), but with a lower confidence score. In these situations, you can help improve the prediction performance by using an optional parameter called “countryHint”
Next, we have Key Phrase Extraction:
- This functionality helps indicate the main points in a text by identifying important words and phrases. This could be useful to generate metadata about documents (to make search easier) or to quickly generate a word cloud
- The JSON response is relatively simple – just an array of key phrases, without the need for any confidence score
Another very useful Text Analytics feature is the ability to perform Sentiment Analysis:
- This functionality allows you to detect how positive (or negative) the text is
- This is pure gold for customer service situations, where you can prioritize customers who are clearly unhappy about their experience. But it can also be used for social media posts and comments, video and book reviews, and much more
- The JSON response will contain confidence scores for three possible sentiments: Positive, Negative, and Neutral. As you can see here in the picture, this customer is clearly happy, with a confidence score of 1 for the Positive sentiment
- If a text with several sentences is submitted, Text Analytics will return a sentiment for each sentence, as well as an overall sentiment for the whole text
Finally, there are two options in Text Analytics related to understanding Entities:
- The first one is Named Entity Recognition, which allows you to recognize certain pre-defined classes within a text – such as people, location, time periods, and organizations. Some of these entities can even have a hierarchy of categories and subcategories – for example, TimeRange is a subcategory of Datetime
- The other option is Entity Linking, which removes uncertainties about identified entities by providing reference links to an external knowledge base – currently, Wikipedia. While Named Entity Recognition is about identification, Entity Linking is about context. For example, let’s suppose that Named Entity Recognition identified Mars as an entity. Would this be Mars, the planet, Mars, the chocolate bar, or Mars, the God of War?
Both services can help provide more context to your text-based analytics and make your applications more capable of understanding what your customer wants.
Let’s now see these capabilities in practice, in a demo!
Emilio Melo has been involved in IT projects in over 15 countries, with roles ranging across support, consultancy, teaching, project and department management, and sales—mostly focused on Microsoft software. After 15 years of on-premises experience in infrastructure, data, and collaboration, he became fascinated by Cloud technologies and the incredible transformation potential it brings. His passion outside work is to travel and discover the wonderful things this world has to offer.