Importing Data into Search Service


Course Introduction
Creating a Cognitive Search Service
Course Summary
2m 37s
Start course

This course focuses on the skills necessary to implement a knowledge-mining solution with a focus on the Cognitive Search solution. The course will walk through how to create a Cognitive Search solution and how to set up the process for importing data. Once the data sources have been set up properly, the course will teach you how to create a search index and then how to configure it to provide the best results possible.

Learning Objectives

  • Create a Cognitive Search solution
  • Import from data sources
  • Create, configure, and test indexes
  • Configure AutoComplete and AutoSuggest
  • Improve results based on relevance
  • Implement synonyms

Intended Audience

  • Developers who want to include full-text search in their applications
  • Data engineers focused on providing better accessibility to organizational data
  • AI engineers that provide AI combined with search functionality in their solutions


To get the most out of this course, you should:

  • Have a strong understanding of data sources and how data will be needed by users consuming a Cognitive Search solution
  • Be able to use REST-based APIs and SDKs to build knowledge-mining solutions on Azure



In this particular video, we're gonna be taking a strong look at the data that you can make available in your search service that's indexed, and then available for querying through a set of APIs that the search service provides. Let's start though by talking about what the supported data sources are for your search index. And as you can see all of these are primarily Azure-based, but some of them, you know, are pretty well straightforward. For example, the bottom three: Azure SQL Database, SQL Managed Instance, and a SQL Server on Azure virtual machines. Standard relational database capabilities that you can make available in your search index.

The next one above those three is Azure Cosmos DB, a NoSQL database engine that Microsoft provides through Azure, that also makes a tremendous amount of sense because that database stores document-based data or JSON-based data, which is also very easily indexable. The top three on the other hand are not necessarily as straightforward. Azure Table Storage is a table but it is not relational, it is though one big gigantic table, also something that can be very easily indexed. The top two though, Blob Storage and Data Lake Storage are going to be specific to the types of blobs that you make available. Meaning that if you have JSON based log data or CSV files, things along those lines that are very easily textual, those are gonna be very easy to index as well.

However, if you'll remember in the previous video where we talked about the features and components of a cognitive search solution, one of them was to add AI enrichments on top of, say, photos or unstructured data. And if you send that data through a Cognitive Service or AI service, those typically will return a JSON document, meaning you can return those AI enrichments and actually index them much like you would any JSON log file that would be stored in blob storage directly. So just something to think about.

In addition to the data storage and the data that's going to get ingested into the search index, you also will create an indexer. An indexer is a component that actually crawls the searchable text as well as metadata and then makes sure to populate that data into a search index using field to field mappings between the source data and your index. Basically, you are defining what those mappings are.

The indexers also drive the enrichment process or the inclusion of those AI Services that I talked about a little while ago, providing capabilities of Cognitive Search, integrating external processes of content en route to the actual index. In an indexer you actually have distinct use cases that you can implement within the scope of your search service. The first is fairly straightforward, single data source, single indexer. Meaning you are choosing a table or a database or a storage account inside of Azure and saying, index that location, and then you're gonna set up the mappings for that single location. The indexer will then take care of doing the automated processing of that data.

Multiple data sources where a single indexer needs to include data from, say, two different tables in a SQL database and include them together as one index. So maybe you want to have a search index that includes both the customer data and the product data together. That is also something that an indexer can perform. Maybe you need to have multiple indexers. If you are using multiple data sources but you need that data separated, then you might need to have multiple indexers to vary runtime parameters. Set them having different schedules or having different field mappings. Sometimes you need to have multiple indexers on the same table where you want a different index to actually index different data properties.

And then lastly is content transformation. And this is where the Cognitive Services or AI Services come into play, where you are providing some additional enrichment on top of the data that you are actually ingesting. For example, let's say you are ingesting a set of text and you want to actually pull-out specific keywords and store them into a separate field in your search index where those keywords are not necessarily clearly defined in the data that you are ingesting. Maybe you are ingesting a large Word document and you are looking for a specific set of keywords that map a type of data. That is something that can also be done and added to the search index as well as the data itself. So with that, let's jump back into the Azure portal and take a look at how you will actually set up your data sources in your search service.

OK, here we are back in the portal and you will notice here that I have started on the overview page for the search service that we created in the last video. And I wanna point out two specific things with respect to the importing of data or the creating of data sources. The first is that there is a wizard here called Import data, that will help you go through the process of creating your data source. However, it is a wizard, that's not the only thing that it will do, it'll also walk through the creation of your initial index, creating the indexer, adding in additional cognitive services, all in one big wizard. So if you are not quite ready for all that, then there is another option and that is to go down here in your overview area to the data sources tab and you can create a new data source from here. Let me quickly show you both of them so that you can at least be prepared, but both handle the same information in the same way.

By clicking on the Import data wizard, as you can see here have connect to your data, add cognitive skills, customize the target index, and then create the indexer. As I mentioned, all of the major pieces here in one location. If we connect to your data, we're gonna choose a data source and the search services do provide you with a set of samples, and that's what I'll show you here 'cause I'm not gonna finish the wizard. And then you're gonna choose in this case, which sample that you want to actually connect to. I'm gonna choose the hotels sample which is coming from a Cosmos DB. And then we move on to the next area which is add the cognitive skills.

That was a very very simplified example. Needless to say, if I'm gonna connect to a database, SQL Server, Cosmos what have you, that has not been predefined, the data is going to be much more complicated. So let's do that, but let's do that through the main data sources tab.

So I'm gonna go ahead and exit out of the wizard, and we're gonna go back to the data sources tab, and I'm gonna click on New Data Source. And this is gonna open up a wizard for just the creating of the data source. Now, I already have a database created, it is a Azure SQL database that has a database associated with it, a database called AdventureWorks. So we are gonna look for that. We are gonna choose Azure SQL Database. We then need to create the data source. So I am gonna say, ds-adventure, because the database name is AdventureWorks. We then can choose an existing connection. So rather than you having to memorize the connection string for your database, as long as it is currently located in Azure and it is an Azure service, it should pop up here in your database dropdown.

In this particular case, it was called search demo. You do of course have to put in a user ID and password. Now, it automatically chose brianadmin because that is the default admin for the database server and is right now the only user in that database. So, I need to put in my password. Then we can test the connection. Connection validated, excellent. The next thing is we need to choose a specific table. I am going to choose the product table which makes perfect sense. It's a common use case for a search index. And then if you needed to, if you wanted to create a filter for your data, you could actually put in a SQL query for the product table and thereby only pull a specific set of products, maybe from ID zero through 1000, things like that.

Now, if your database tables have been defined for change tracking, deletion tracking, this is a configuration option that you can turn on within your database tables whether they are in a SQL Database or in a Cosmos DB, then you can absolutely check track deletions and track changes. And then you need to specify what your column is going to be for a soft delete and what the value should be for that. In this particular case, my table doesn't have deletion tracking turned on, so I am gonna remove that. And then for change tracking, you need to specify what is your change detection policy. And the purpose of a data change detection policy is to efficiently identify the changed data items.

Supported policies vary based on the data source type. In this particular case is an Azure SQL database, it's either through integrated change tracking, which means the SQL Server database is doing it or it's done via a high watermark column. Again, this particular table in this database does not have change tracking turned on, but it is something that you can specify. And that's all that you need to create your data source. I am gonna go ahead and click Save.

Now, in order for me to actually start to index this table, I need to, and by the way, this is a UI bug that you see here, you did see the fact that I did save it and we can look at the notifications again, there it was right there, so this is actually just a UI bug, do not worry about it. You'll see that the actual data source was created successfully and it shows up in my data source tab.

Now, I actually cannot create my indexer yet. You would think that I could, but in fact, that's not the case. I actually have to create my index first before I can create the indexer, and really they should be done one right after another. Anyway, the indexer is not very complicated, but the index is. And in the next video, that's what we're gonna focus on. We're gonna take a look at how to create your index for your data and how to actually test that index once it has been indexed.

About the Author

Brian has been working in the Cloud space for more than a decade as both a Cloud Architect and Cloud Engineer. He has experience building Application Development, Infrastructure, and AI-based architectures using many different OSS and Non-OSS based technologies. In addition to his work at Cloud Academy, he is always trying to educate customers about how to get started in the cloud with his many blogs and videos. He is currently working as a Lead Azure Engineer in the Public Sector space.