![](/img/trans.png)
[英]How to reliably determine when an Azure Cognitive Search index is up to date?
[英]How to set up cognitive search capabilities(with OCR) in azure search programmatically through Java?
我想在我的应用程序中提供全文搜索功能,因此我尝试使用认知搜索功能配置 Azure 搜索,以便我可以索引存储在 Azure Blob 存储中的图像和非图像文档。 However, while configuring Azure Search through Java code using Azure Search's REST APIs, i am not able to leverage OCR capabilities into Azure Search and the image documents are not getting indexed. I am missing some configuration details while configuring Azure search through Java code(using Azure Search REST APIs).
案例 1:从 Azure 门户,我能够
案例 2:从 Java 代码使用 Azure REST API,我能够
I am using following sample Azure Search Rest API's from Java code 1. https://%s.search.windows.net/datasources?api-version=%s 2. https://%s.search.windows.net/技能集/cog-search-demo-ss?api-version=%s 3. https://%s.search.windows.net/indexes/%s?api-version=%s 4. https://%s .search.windows.net/indexers?api-version=%s
配置jsons:1.datasource.json
{
"name" : "csstoragetest",
"type" : "azureblob",
"credentials" : { "connectionString" : "connectionString" },
"container" : { "name" : "csblob" }
}
{
"description": "Extract text from images and merge with content text to produce merged_text",
"skills":
[
{
"description": "Extract text (plain and structured) from image.",
"@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
"context": "/document/normalized_images/*",
"defaultLanguageCode": "null",
"detectOrientation": true,
"inputs": [
{
"name": "image",
"source": "/document/normalized_images/*"
}
],
"outputs": [
{
"name": "text",
"targetName": "myText"
},
{
"name": "layoutText",
"targetName": "myLayoutText"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.MergeSkill",
"description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
"context": "/document",
"insertPreTag": " ",
"insertPostTag": " ",
"inputs": [
{
"name":"text", "source": "/document/content"
},
{
"name": "itemsToInsert", "source": "/document/normalized_images/*/text"
},
{
"name":"offsets", "source": "/document/normalized_images/*/contentOffset"
}
],
"outputs": [
{
"name": "mergedText", "targetName" : "merged_text"
}
]
}
]
}
{
"name": "azureblob-indexing",
"fields": [
{ "name": "id", "type": "Edm.String", "key": true, "searchable": false },
{ "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
]
}
{
"name" : "azureblob-indexing1",
"dataSourceName" : "csstoragetest",
"targetIndexName" : "azureblob-indexing",
"schedule" : { "interval" : "PT2H" },
"skillsetName" : "cog-search-demo-ss",
"parameters":
{
"maxFailedItems":-1,
"maxFailedItemsPerBatch":-1,
"configuration":
{
"dataToExtract": "contentAndMetadata",
"imageAction":"generateNormalizedImages",
"parsingMode": "default",
"firstLineContainsHeaders": false,
"delimitedTextDelimiter": ","
}
}
}
通过 java 代码配置 Azure 搜索后,图像文档应该在 azure 搜索中被索引,我应该能够根据其中包含的文本进行搜索。
尝试将默认语言代码设置为 null ,而无需在技能组中使用引号。json :
"defaultLanguageCode": null
我已经弄清楚了自己需要的配置。 如上所述(在问题中),它需要匹配案例 1 和 2 之间的所有参数,然后更新配置 json。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.