简体   繁体   English

使用数据源作为索引内 JSON 文档中的字段创建索引器

[英]Create indexer with data source as a field in JSON document inside Index

I have an Index containing Document in JSON format in Azure Search Service.我在 Azure 搜索服务中有一个包含 JSON 格式文档索引

Index Schema索引架构

{
"name": "product-api",
"defaultScoringProfile": null,
"fields": [
    {
        "name": "upcid",
        "type": "Edm.String",
        "searchable": true,
        "filterable": false,
        "retrievable": true,
        "sortable": true,
        "facetable": false,
        "key": true,
        "indexAnalyzer": null,
        "searchAnalyzer": null,
        "analyzer": null,
        "synonymMaps": []
    },
    {
        "name": "productName",
        "type": "Edm.String",
        "searchable": true,
        "filterable": false,
        "retrievable": true,
        "sortable": false,
        "facetable": false,
        "key": false,
        "indexAnalyzer": null,
        "searchAnalyzer": null,
        "analyzer": null,
        "synonymMaps": []
    },
    {
        "name": "imageUrl",
        "type": "Edm.String",
        "searchable": false,
        "filterable": false,
        "retrievable": true,
        "sortable": false,
        "facetable": false,
        "key": false,
        "indexAnalyzer": null,
        "searchAnalyzer": null,
        "analyzer": null,
        "synonymMaps": []
    },
    {
        "name": "ocrText",
        "type": "Edm.String",
        "searchable": false,
        "filterable": false,
        "retrievable": true,
        "sortable": false,
        "facetable": false,
        "key": false,
        "indexAnalyzer": null,
        "searchAnalyzer": null,
        "analyzer": null,
        "synonymMaps": []
    }
],
"scoringProfiles": [],
"corsOptions": {
    "allowedOrigins": [
        "*"
    ],
    "maxAgeInSeconds": null
},
"suggesters": [],
"analyzers": [],
"tokenizers": [],
"tokenFilters": [],
"charFilters": [],
"encryptionKey": null,
"similarity": {
    "@odata.type": "#Microsoft.Azure.Search.ClassicSimilarity"
}
}
  • My requirement我的要求

Create an Indexer which could use the imageUrl (image not stored in azure storage service) field as data source, Microsoft.Skills.Vision.OcrSkill as a skill and maps the output to field ocrText .创建一个索引器,它可以使用imageUrl (图像未存储在 azure 存储服务中)字段作为数据源,将Microsoft.Skills.Vision.OcrSkill作为技能并将输出映射到字段ocrText

  • Problem问题

From what I have read from the docs, the data source (in my case, image) must be in Azure Blob Storage to create Indexer.根据我从文档中读到的内容,数据源(在我的例子中是图像)必须在Azure Blob 存储中才能创建索引器。

Have anyone done something similar to my requirement?有没有人做过类似我要求的事情? Or does anyone know any direct or indirect method to achieve the requirement?或者有没有人知道任何直接或间接的方法来达到要求?

It would be great if any leads are provided, I could not find anything related to this on the Internet.如果提供任何线索就太好了,我在互联网上找不到任何与此相关的内容。

How did you populate the imageUrl data in the search index to begin with?您是如何开始在搜索索引中填充 imageUrl 数据的?

I'm asking because there's no way to configure an Indexer to ingest data from a search index as the data source.我问是因为没有办法配置索引器来从搜索索引中提取数据作为数据源。 If you are able to put those image urls somewhere else (eg: blob storage), you could point an Indexer at that.如果您能够将这些图像 url 放在其他地方(例如:blob 存储),您可以指向一个索引器。 If it's another Indexer that's populating the source index to begin with, you can add a knowledge store to that primary Indexer to sink the imageUrl data to blob/table storage as well as the search index.如果是另一个索引器开始填充源索引,您可以向该主索引器添加知识存储,以将 imageUrl 数据接收到 blob/表存储以及搜索索引。 Or, just process the url in the primary Indexer's skillset and don't bother with this secondary pass!或者,只需处理主索引器技能组中的 url,而不要理会这个次要传递!

The next issue is that Indexer's won't crawl arbitrary urls that you provide it.下一个问题是 Indexer 不会抓取您提供的任意网址。 It only ingests data from the datasource, or returned to it by a skill.它只从数据源中摄取数据,或由技能返回给它。 It is possible to write a custom web api skill that will take the url as input, download the image from that url, and respond to the indexer with the binary image data.可以编写一个自定义的 web api 技能,将 url 作为输入,从该 url 下载图像,并使用二进制图像数据响应索引器。 This functionality is not very well documented, but there exists an example powerskill that does something along those lines that you could more or less copy.这个功能没有很好的文档记录,但是有一个示例 powerskill可以做一些你可以或多或少复制的事情。

The rest of this secondary Indexer's pipeline should be pretty straight forward (add an ocr skill, and output field mapping functions to merge the data back into the same index).这个辅助索引器管道的其余部分应该非常简单(添加 ocr 技能和输出字段映射函数以将数据合并回同一索引)。 The indexer won't override existing values with nulls, so just make sure to only map the one new field back to the index, and the rest of the index's data will remain unchanged.索引器不会用空值覆盖现有值,因此只需确保仅将一个新字段映射回索引,而索引的其余数据将保持不变。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 创建Azure搜索索引,索引器,数据源的步骤顺序是什么? - What are the order of steps to create an Azure Search Index, Indexer, Data Source? Azure搜索:索引文档计数与索引器文档计数不对应 - Azure Search: Index Document count does not correspond to Indexer document count 无法使用 REST API 将 cosmos db 的嵌套数据源字段 map 嵌套到 Azure 索引器的根索引字段 - Unable to map nested datasource field of cosmos db to a root index field of Azure indexer using REST APIs Azure 搜索索引器每次索引整个 Cosmos db 数据源 - Azure search indexer indexes entire Cosmos db data source every time 是否可以通过Azure搜索将数据推入(使用API​​)和将数据拉入(使用索引器)到同一索引中? - Is it possible to Push (using API) and Pull (using indexer) data into the same Index with Azure Search? 天蓝色搜索中的索引器可以使用其他任何字符代替来作为分隔符来打断单词并放入集合类型天蓝色搜索索引字段中吗? - Can indexer in azure search use any other character instead of , as a delimiter to break words and put in a collection type azure search index field? Azure 搜索 - 创建索引 - JSON 解码错误 - Azure search - Create Index - JSON decode error Azure搜索索引器-排除门户中不起作用的文档类型 - Azure Search Indexer - excluding document types not working in the portal Azure 搜索索引器无法检索从 DocumentDB 中的文档归档的 GeographyPoint - Azure Search Indexer cannot retrieve a GeographyPoint filed from a document in DocumentDB Azure搜索索引器的运行速度有多快?我可以如何快速建立索引? - How fast is Azure Search Indexer and how I can index faster?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM