简体   繁体   English

如何在Azure搜索的主记录下索引多个Blob?

[英]How to index multiple blobs under a main record in Azure Search?

I followed the steps described on this tutorial . 我按照本教程中描述的步骤进行操作。 My case is a little bit different: 我的情况有些不同:

  • Instead of indexing Hotels and Rooms, I am indexing Candidates and Resumes. 我没有索引旅馆和客房,而是索引了候选人和简历。
  • Instead of using CosmosDB I am using an Azure SQL Database. 我不是使用CosmosDB,而是使用Azure SQL数据库。

Following the tutorial, I am able to create the Index, the 2 Indexers (one for the SQL DB and one for the Blobs storage), and the 2 data sources. 遵循本教程,我能够创建索引,两个索引器(一个用于SQL DB,一个用于Blobs存储)以及两个数据源。

The SQL DB contains all my candidates, and the storage contains all their resumes (files with PDF/DOC/DOCX formats). SQL DB包含我所有的候选项,而存储中包含其所有简历(PDF / DOC / DOCX格式的文件)。 Each blob has a metadata "ResumeCandidateId" that contains the same value as the "CandidateId" for the Candidate. 每个Blob都有一个元数据“ ResumeCandidateId”,该元数据包含与候选者“ CandidateId”相同的值。

I have the following fields for my Index: 我的索引包含以下字段:

    [SerializePropertyNamesAsCamelCase]
    public partial class Candidate
    {
        [Key]
        [IsFilterable, IsRetrievable(true), IsSearchable]
        public string CandidateId { get; set; }

        [IsFilterable, IsRetrievable(true), IsSearchable, IsSortable]
        public string LastName { get; set; }

        [IsFilterable, IsRetrievable(true), IsSearchable, IsSortable]
        public string FirstName { get; set; }

        [IsFilterable, IsRetrievable(true), IsSearchable, IsSortable]
        public string Notes { get; set; }

        public ResumeBlob[] ResumeBlobs { get; set; }
    }

    [SerializePropertyNamesAsCamelCase]
    public class ResumeBlob
    {
        [IsRetrievable(true), IsSearchable]
        [Analyzer(AnalyzerName.AsString.StandardLucene)]
        public string content { get; set; }

        [IsRetrievable(true)]
        public string metadata_storage_content_type { get; set; }

        public long metadata_storage_size { get; set; }

        public DateTime metadata_storage_last_modified { get; set; }

        public string metadata_storage_name { get; set; }

        [Key]
        [IsRetrievable(true)]
        public string metadata_storage_path { get; set; }

        [IsRetrievable(true)]
        public string metadata_content_type { get; set; }

        public string metadata_author { get; set; }

        public DateTime metadata_creation_date { get; set; }

        public DateTime metadata_last_modified { get; set; }

        public string ResumeCandidateId { get; set; }
    }

As you can see, one Candidate can have multiple Resumes. 如您所见,一个候选人可以有多个简历。 The challenge is to populate the ResumeBlobs property... 挑战在于如何填充ResumeBlobs属性...

The data from the SQL DB is indexed and mapped correctly by the Indexer. 索引器正确索引和映射了来自SQL DB的数据。 When I run the Blobs Indexer, it loads documents, however it does not map them and they never show up in the search (ResumeBlobs is always empty). 当我运行Blobs索引器时,它会加载文档,但是不会映射它们,并且它们永远不会显示在搜索中(ResumeBlobs始终为空)。 Here is the code used to create the Blobs Indexer: 这是用于创建Blobs索引器的代码:

var blobDataSource = DataSource.AzureBlobStorage(
                name: "azure-blob-test02",
                storageConnectionString: "DefaultEndpointsProtocol=https;AccountName=yyy;AccountKey=xxx;EndpointSuffix=core.windows.net",
                containerName: "2019");

            await searchService.DataSources.CreateOrUpdateAsync(blobDataSource);

            List<FieldMapping> map = new List<FieldMapping> {
                new FieldMapping("ResumeCandidateId", "CandidateId")
            };

            Indexer blobIndexer = new Indexer(
                name: "hotel-rooms-blobs-indexer",
                dataSourceName: blobDataSource.Name,
                targetIndexName: indexName,
                fieldMappings: map,
                //parameters: new IndexingParameters().SetBlobExtractionMode(BlobExtractionMode.ContentAndMetadata).IndexFileNameExtensions(".DOC", ".DOCX", ".PDF", ".HTML", ".HTM"),
                schedule: new IndexingSchedule(TimeSpan.FromDays(1)));

            bool exists = await searchService.Indexers.ExistsAsync(blobIndexer.Name);
            if (exists)
            {
                await searchService.Indexers.ResetAsync(blobIndexer.Name);
            }
            await searchService.Indexers.CreateOrUpdateAsync(blobIndexer);

            try
            {
                await searchService.Indexers.RunAsync(blobIndexer.Name);
            }
            catch (CloudException e) when (e.Response.StatusCode == (HttpStatusCode)429)
            {
                Console.WriteLine("Failed to run indexer: {0}", e.Response.Content);
            }

I commented the parameters for the blobIndexer but I get the same results even if it's not commented. 我注释了blobIndexer的参数,但是即使未注释也得到相同的结果。

When I run a search, here is an example of what I get: 当我运行搜索时,这是我得到的示例:

{
    "@odata.context": "https://yyy.search.windows.net/indexes('index-test01')/$metadata#docs(*)",
    "value": [
        {
            "@search.score": 1.2127206,
            "candidateId": "363933d1-7e81-4ed2-b82e-d7496d98db50",
            "lastName": "LAMLAST",
            "firstName": "ZFIRST",
            "notes": "MGA ; SQL ; T-SQL",
            "resumeBlobs": []
        }
    ]
}

"resumeBlobs" is empty. “ resumeBlobs”为空。 Any idea how to do such a mapping? 任何想法如何做这样的映射?

AFAIK, Azure Search doesn't support a collection merge feature that seems to be necessary to implement your scenario. AFAIK,Azure搜索不支持实现方案所需的集合合并功能。

An alternative approach to this is to create a separate index for resumes and point the resume indexer to that index. 一种替代方法是为简历创建一个单独的索引,然后将简历索引器指向该索引。 That means that some of your search scenarios will have to hit two indexes, but it's a path forward. 这意味着您的某些搜索方案将必须命中两个索引,但这是一条前进的道路。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM