Creating a Cognitive Search Service
Enhance Cognitive Search Solution
This course focuses on the skills necessary to implement a knowledge-mining solution with a focus on the Cognitive Search solution. The course will walk through how to create a Cognitive Search solution and how to set up the process for importing data. Once the data sources have been set up properly, the course will teach you how to create a search index and then how to configure it to provide the best results possible.
- Create a Cognitive Search solution
- Import from data sources
- Create, configure, and test indexes
- Configure AutoComplete and AutoSuggest
- Improve results based on relevance
- Implement synonyms
- Developers who want to include full-text search in their applications
- Data engineers focused on providing better accessibility to organizational data
- AI engineers that provide AI combined with search functionality in their solutions
To get the most out of this course, you should:
- Have a strong understanding of data sources and how data will be needed by users consuming a Cognitive Search solution
- Be able to use REST-based APIs and SDKs to build knowledge-mining solutions on Azure
Hi there, in this video and the next two in the series, we're gonna be talking about the base component for a cognitive search solution, the search service. And as such, we're gonna be talking about how to create a search service, how to import data into that search service so that you can then eventually create and test an actual search index.
Once we've done that, in the next lecture group, we'll be taking a look at how to add additional features to the search service to provide that cognitive search capability or artificial intelligence based capabilities. Let's first talk about the components though.
There is a search engine for full text search, hence the search service. And in Azure, the search service is based on the Elasticsearch open source product. You then, of course, are gonna have some sort of persistent storage of user-owned content that will be indexed. The search service then will provide a set of APIs for indexing, as well as querying that content, meaning that you have the ability to both query the content as well as make manual updates to that content should things change. And then you can add additional features or AI-based enrichments to create searchable content out of images, out of raw unstructured data, or application files.
You can also provide integration with other services for data, machine learning, AI, monitoring, and security. And then one of the newest features that's part of the search service or cognitive search solution is the ability to add semantic search, which right now is still in preview, but is something that we will at least touch on in this overall course. So let's go ahead and get started by jumping into the base component of the solution, which is the search service, and how to actually create it.
Okay, here we are in the portal. And to get things started, we need to create the basis of our cognitive search solution, the search service. Now, you'll notice there's Search Services already up here at the top, and that's because it's a shortcut for me. But to show you, should you have not ever created a search service before, you can go into All Services. And then under the Web category, you will find Search Services right here. And that's where we're gonna start. We're gonna create a brand new one. And we're going to, I already have a resource group chosen, so make sure that you choose one of those. And then we're just gonna give this a name. We're just gonna call it cognitive-search, actually, it needs to be all lowercase. And I'm gonna add my location to the end of it, that's just my own personal naming scheme. I'm based on the East Coast of the United States, so we're gonna choose East US 2. And then we have our pricing tier.
Now, there are eight different pricing tiers here. There is, of course, a free tier, most services inside of Azure provide that. There is a basic, there are three standards, including a high density version of standard, and then there are two large storage optimized. Each one of these tiers provides you information that will help you choose which one is the most important to you.
Starting from the left is the total number of indexes that you will be supporting inside of this search service. This is going to be completely defined by your needs, whether or not you want to provide multiple indexes for multiple applications in the same search service or maybe you need to separate them out because of privacy reasons, security reasons. How many indexers will you be able to support, meaning how many processes to keep the data indexed and up to date will you be needing? How much data storage will you be needing for those indexes that you're going to be searching? Now, this is gonna very much depend on the data sources that you're choosing, how much data is in the tables or the databases in a Cosmos database, and so on. That's something that you're gonna need to know before you can make a determination about your storage needs.
How many search units do you need? This is going to depend upon how much querying are you going to be doing against the search index. If you are going to be having a highly volatile search-based application such as an e-commerce system, then more search units are going to be to your benefit, it's gonna provide better performance.
Next is replicas. If you need high availability for your search indexes, if you need to provide replicas for disaster recovery scenarios or things like that, that's going to help you determine that particular answer. And then do you need to partition that data? And this is gonna be something that you're only going to learn over time depending upon the data that you're ingesting. And then at the very end is, of course, how much is this search service going to cost per unit per month? And of course, you can see that that number goes up over the tiering sizes.
So I am going to choose basic for our demonstration purposes 'cause it gives me all of the available functionality without taking away too much and doesn't cost me too much either. That being said, if you're an individual developer, I do highly recommend the free tier. It makes the development very, very simple and you do get everything that you need. You're just not gonna be able to do any performance testing on it. So I'm gonna set mine to basic. Then we're gonna go on to the next area of the wizard, which is the scale.
The scaling allows you to determine the high availability or the percentage of availability that your search service can provide based on the number of replicas that are supported for your search service. Now, in development, leave it at one because there's no reason for you to have a replica. But if we move our replica slider up here and we go to two, you can see that immediately jumps me to a 99.9% availability for read operations. I jump it to three. And now I have both 99.9% availability for read and write operations with respect to this search service.
So your specific application requirements will help you determine what your availability should be and therefore how many replicas you should be configuring. And then lastly is the number of partitions. And this is just allowing for your search query engine to perform more efficiently based on the fact that you know you have more data in A through M than you do through, with N through Z, just as a very, very simple example. The next is moving on to the Networking tab.
Now, in the Networking tab, you're gonna define whether your search service is publicly available or privately available. What I mean by that is by default, a search service, the API for querying and indexing, is a public endpoint. And if your application is also publicly accessible or you're gonna be making your search service available through both your application as well as through additional means, then the public endpoint is probably gonna be the right choice for you. However, if your search service is going to be part of an internal organizational application or there are specific security and privacy requirements around the data that's gonna be in that index, then maybe the index's API needs to be private.
By clicking on this private link, we then have the ability to create a private endpoint for the search service so that it can be attached to a virtual network, thereby giving it a private endpoint inside of that virtual network. And assuming that your application is also private inside of a virtual network, it will keep all of the traffic inside of the virtual network and only allow traffic to the search service and the application from your internal organizational users.
Just as an example, there are numerous architectures where this makes tremendous amounts of sense. However, during your initial installation and configuration of your search service, I highly recommend that you keep it on public because it will provide all of the full functionality available to you to set up all of the pieces of the search service during initial creation. After that, once you've got it configured, then you can move it to a private endpoint from that point forward.
And then lastly is Tags. Nothing special about tags, they're attached to every single resource. So we'll then review and create. And then this will be the search service that we use for demonstrations throughout the rest of the videos of this particular course. So hopefully, I'll see you in the next course or in the next video when we're talking about the importing and creating of data sources.
Brian has been working in the Cloud space for more than a decade as both a Cloud Architect and Cloud Engineer. He has experience building Application Development, Infrastructure, and AI-based architectures using many different OSS and Non-OSS based technologies. In addition to his work at Cloud Academy, he is always trying to educate customers about how to get started in the cloud with his many blogs and videos. He is currently working as a Lead Azure Engineer in the Public Sector space.