Indexing in Azure Cognitive Search
The course is part of this learning path
This course will focus on the skills required to manage and maintain the indexing process for an Azure Cognitive Search solution. As data changes within a given data source, the requirement to rebuild an index or set up the schedule for an index becomes very important. Understanding all of the functions related to the indexing process is important when you know that there are going to be periodic updates to the underlying data source, and this course will teach you the skills to perform all of those functions.
- Manage re-indexing
- Rebuild indexes
- Schedule and monitor indexing
- Implement incremental indexing
- Manage concurrency
- Push data to an index
- Troubleshoot indexing for a pipeline
- Developers who will be including full-text search in their applications
- Data Engineers focused on providing better accessibility to organizational data
- AI Engineers who will be providing AI combined with search functionality in their solutions
Candidates for this course should have a strong understanding of data sources and the operational requirements for those data source changes. Candidates should also be able to use REST-based APIs and SDKs to build knowledge mining solutions on Azure.
Hi there, in this last video, we're gonna cover the topic of how to monitor and troubleshoot your indexes and the indexing process. Now, let's first talk about what kinds of data points, or what kinds of things you actually are going to want to take a look at, so with respect to the service as a whole, the search service, you're gonna want to take a look at and understand the health of that service, and the changes to service configuration, and how can you gather that information from Azure? You're gonna want to make sure that you understand your storage consumption so that your index never hits a tipping point where no new data could ever be consumed within the index. Object limits on your indexes, indexers, and other objects that are driven by the particular tier of service that you might happen to be using.
So, defining the counts for each type relative to the maximum allowed for your service tier. And, of course, for your developers when consuming your search service, understanding the query activity, the volume, the latency, whether or not you're getting any throttles or dropped queries, because maybe you hit an object limit, or maybe you're having performance problems. And then understanding the indexing activity processes themselves so making sure that the indexer is not running into issues when actually loading data in from your particular data sources, is there a problem connecting to your data source, things along those lines?
Now, when talking about all of those different pieces of data that you want to get information for, how can you gather that information? Well, the great thing is, is that Azure provides a set of features and services out-of-the-box that every customer has access to. And those features and services are tied to every single major function inside of Azure, including Azure Cognitive Search. And they all fall under Azure Monitor.
Now, I'm not gonna go into tremendous depth around Azure Monitor, I would expect that you already have some knowledge and understanding of that. If you do not Cloud Academy, I am sure, has a course or multiple courses on Azure Monitor's capabilities. But to start off with the most common feature that every single service ties into is diagnostics and metrics where you're gathering log data or performance data around how, in this case, your Azure search service and your indexing activities are performing, and then taking that data and understanding what do you want to do with it? Where are you going to work with that data? You could push it into Azure Blob Storage and then connect it to a third-party service of your own. You could push it into Azure Log Analytics, which is Microsoft's out-of-the-box log aggregation engine tool that allows for querying, reporting, alerting, all kinds of different things. Or you could push the data, each piece individually, to an event stream such as Azure Event Hub, and then connect your Azure Event Hub to, again, another third-party service, such as Splunk, for example.
I already talked about log analytics as being a storage location for your diagnostics and metrics. But once all of that data is in there, and assuming you're also feeding other pieces of Azure data, such as Azure Activity Log which allows you to determine who does what and when to your search service, to your indexers, and so on, you can start to put together a more holistic picture using log queries, bringing data from all of those different disparate data sources, creating reports, creating alerts all around the data, tied to your search solution. However, for your developers, they want to understand information about what are the users doing within the scope of interacting with your search service, so getting client-side analytics, and that can only be done using application insights, which is a SDK that your developers will need to use in order to gather that data. And, again, push it into log analytics so that you can create that holistic picture.
And then, lastly, for maybe your executives, your financial staff, or even just your IT director, or software engineering director is taking those reports and putting them into a reporting tool that they can then consume without having to understand all of the underlying pieces, or even need to have access to Azure necessarily. Now, one recommendation I am certainly going to make, and I know Microsoft makes this as well, is send all of your data to Azure Log Analytics. Azure Log Analytics has a very low end price point, and it allows you to create queries across all of the different disparate data sources that are feeding it, allows you to create graphs, charts, and reports that you can push into a Azure dashboard. Or, as I talked about, you can actually feed into Power BI as well, you also have the ability to create dynamic alerting off of these queries so that you can actually create a query that will let you know when something bad happens, sends you a text message, automatically run some programmatic files, things like that.
Now, let's take a quick look at what some of the things that you'll see in your Azure Search Service. This is a graphical view of the overview page for your Azure Search Service and under the Monitoring tab, at number one, you'll see there that you are already getting some performance related data tied into your search service. Search latency, search queries per second, and throttle search query percentages. So, you're already starting to gather that information, but maybe you need to have more control over that data, you want to look over it at a longer time span other than the default 30 days maximum that Microsoft provides you, things like that.
You might also though want to tie it into, you'll look at point number two, the activity log which, as I mentioned, is an audit trail of who has done what within the scope of your search service and what were the operations that they performed. And then, down at item number three, you can see all of the connections to Azure Monitor, where you can create alerts, you can view individual metrics and you can also configure diagnostics to send your data to the logs location, or Azure Log Analytics.
Now as, another viewpoint, if we look at the Usage tab also again, in the overview page of your Azure Search Service, you're gonna be able to actually gather information about how are you using your search service? Where does your data storage fit within the scope of your quota? How many indexes are you using versus what the maximum is for the tier that you chose, and same thing with indexers, and connected data sources. This is going to give you the view that you need in order to determine, "Hey, should I go from standard tier one to standard tier two, or maybe I should even drop down because I'm not even using my particular search service as strongly as I had originally thought."
Hopefully, this gives you a good understanding of some of the capabilities that are available with respect to out-of-the-box monitoring and troubleshooting in Azure, as I said, if you need more information on the specific features and functions of Azure Monitor, I am sure that Cloud Academy has one to many courses that will provide you that level of detail.
Brian has been working in the Cloud space for more than a decade as both a Cloud Architect and Cloud Engineer. He has experience building Application Development, Infrastructure, and AI-based architectures using many different OSS and Non-OSS based technologies. In addition to his work at Cloud Academy, he is always trying to educate customers about how to get started in the cloud with his many blogs and videos. He is currently working as a Lead Azure Engineer in the Public Sector space.