This course will focus on the skills required to manage and maintain the indexing process for an Azure Cognitive Search solution. As data changes within a given data source, the requirement to rebuild an index or set up the schedule for an index becomes very important. Understanding all of the functions related to the indexing process is important when you know that there are going to be periodic updates to the underlying data source, and this course will teach you the skills to perform all of those functions.
Learning Objectives
- Manage re-indexing
- Rebuild indexes
- Schedule and monitor indexing
- Implement incremental indexing
- Manage concurrency
- Push data to an index
- Troubleshoot indexing for a pipeline
Intended Audience
- Developers who will be including full-text search in their applications
- Data Engineers focused on providing better accessibility to organizational data
- AI Engineers who will be providing AI combined with search functionality in their solutions
Prerequisites
Candidates for this course should have a strong understanding of data sources and the operational requirements for those data source changes. Candidates should also be able to use REST-based APIs and SDKs to build knowledge mining solutions on Azure.
In our last video, I want us to talk about and review all of the different topics that we covered throughout the entire course. We talked about how to push data to your index, how to make that data be pushed incrementally, maybe because of the size of your data source or because you've got maybe potentially multiple data sources feeding your index, whatever the reason might happen to be.
We talked about how to set up re-indexing and rebuilding, what the differences between the two were, and what some of the actual actions are that would potentially lead you down the path of one or the other. Talked about how to schedule your indexing, whether you were going to be doing that via the standard indexer capabilities or doing scheduling inside of a code location. We talked about how to manage the concurrency of all of the resources that are making up your particular search solution, whether that be your data source, your indexer, your synonym map, whatever it might be. And then lastly, we talked about how to monitor and troubleshoot your indexes so that you can guarantee the performance is working well and that the indexing process is working correctly.
I also talked about a number of additional resources that I would want to provide to you, and these are some links that you can find at the end of this particular slide deck that will take you to Azure documentation related to the topics that we actually covered if you want to get some more detailed information. And then lastly, I said that I would provide you with some code files related to the different topics, because many of the topics that we cover can only be achieved via a SDK that Microsoft provides for the Azure search service.
So here is a GitHub repository for how to look at pushing data programmatically, one on how to handle the concurrency and specifically the ETags programmatically, and then lastly how to set up scheduling of indexes and the indexing process. All of these are provided by Microsoft as part of their own samples, and this is the link for how to get to each of them. I hope that you've enjoyed all of the content of this course, and hopefully, I'll see you again very, very soon.
Brian has been working in the Cloud space for more than a decade as both a Cloud Architect and Cloud Engineer. He has experience building Application Development, Infrastructure, and AI-based architectures using many different OSS and Non-OSS based technologies. In addition to his work at Cloud Academy, he is always trying to educate customers about how to get started in the cloud with his many blogs and videos. He is currently working as a Lead Azure Engineer in the Public Sector space.