This course will focus on the skills required to manage and maintain the indexing process for an Azure Cognitive Search solution. As data changes within a given data source, the requirement to rebuild an index or set up the schedule for an index becomes very important. Understanding all of the functions related to the indexing process is important when you know that there are going to be periodic updates to the underlying data source, and this course will teach you the skills to perform all of those functions.
Learning Objectives
- Manage re-indexing
- Rebuild indexes
- Schedule and monitor indexing
- Implement incremental indexing
- Manage concurrency
- Push data to an index
- Troubleshoot indexing for a pipeline
Intended Audience
- Developers who will be including full-text search in their applications
- Data Engineers focused on providing better accessibility to organizational data
- AI Engineers who will be providing AI combined with search functionality in their solutions
Prerequisites
Candidates for this course should have a strong understanding of data sources and the operational requirements for those data source changes. Candidates should also be able to use REST-based APIs and SDKs to build knowledge mining solutions on Azure.
Hi there. In this video, we're going to be talking about the reindexing and rebuilding processes related to your indexes. Now in the simplest of terms, reindexing versus rebuilding can be thought of as index stays the same, index changes. Those are the big differences. So for reindexing, reindexing doesn't change the index's structure; it only adds or modifies the data related to the documents stored within the index structure. You just run the same indexer against the same data source. And based on the way that all of that has been configured, you will get additional documents that may have been added to your data source. You will potentially get updates to fields of documents within your data source, and within your index, but you're never going to get complete index structure changes. That can only be done via rebuilding.
Rebuilding denotes a significant change to the index structure, such as the changing or deleting of fields, or changing your Azure search tiers, which obviously affects different limits, configurations, and capabilities. So if you are updating or reindexing, here are some of the conditions: Maybe you wanna add a new field to all of the documents that currently exist in your index. You wanna set the retrievable attribute on an existing field in your index, so that the actual data which is already there can actually be viewed by the users performing their request. You wanna add a new search analyzer on top of an existing field. You wanna add a new search analyzer definition.
Now I am making some assumptions that you already understand what search analyzers are, and what their capabilities are going to be with respect to how your idex is being used. Maybe you wanna add, update, or delete scoring profiles on top of different fields, so that when a user actually performs a request, it provides them with documents that more tightly go with those different scoring profiles. You wanna add, update, or delete Synonym Maps on top of existing text fields. Because all of these conditions are additive to the index, they are not a rebuild, they are just additional features and functions that are being modified to your existing dataset, your existing index structure.
Now one thing to keep in mind. This is a rebuild recommendation, and this comes directly from Microsoft. I fully, 100% agree with it. For applications already in production, we recommend creating a new index that runs side by side an existing index to avoid query downtime. Then change your application code to redirect to the new index once you are 100% sure that the new index structure is what you want for your application. This also helps provide a better development and QA testing cycle for your index structure as well.
Now when it comes to rebuilding an index, what are some of the conditions that would force a rebuild? You're gonna change a field definition. Maybe you have an existing field that is a string, and you need to change it to a INT. That would require a rebuild. You wanna assign an analyzer to a field. Now you'll remember I mentioned search analyzers in the reindexing set of conditions, but that is where an analyzer already existed, and you were just changing the capabilities and configuration of that analyzer. This is adding of a new analyzer to a field that did not already have one. You wanna update or delete an analyzer definition that is attached to an existing field. You wanna add a field suggested.
Now remember a field suggester is something that is stored in the index. It is not a process that occurs after the request of the index documents. You want to completely delete a field. The only way that can be done is with a rebuild. And then I talked about this at the beginning, switching tiers of your Azure Cognitive Search Service will require you to do a rebuild of any index that is inside of that search service. Now when actually performing a rebuild, there is no one simple programmatic way to do a rebuild, because a rebuild is destroying everything that you have, and creating new.
So, you're gonna want to create a backup of your existing index structure in case you should ever need to rollback for any reason. You are then going to delete or drop your existing index. You're gonna create a brand-new revised index, and then you're going load. Now obviously this means that there would potentially be downtime during this process, especially during the loading step. This is why Microsoft, and I are recommending that you create your rebuilt index as a stand aside alongside in parallel the existing one to make the transfer process in your application much smoother for your users.
Now like I have done in previous courses, because this is all primarily done via code files, whether it's a .NET SDK or REST APIs, some of these stuff can also be done using the Azure CLI. I will provide some code samples in a GitHub repository, and that repository's link will be at the end of the slide deck. Hopefully this gives you a good understanding of the differences between reindexing and rebuilding, and when you would want to do each individual one.
Brian has been working in the Cloud space for more than a decade as both a Cloud Architect and Cloud Engineer. He has experience building Application Development, Infrastructure, and AI-based architectures using many different OSS and Non-OSS based technologies. In addition to his work at Cloud Academy, he is always trying to educate customers about how to get started in the cloud with his many blogs and videos. He is currently working as a Lead Azure Engineer in the Public Sector space.