繁体   English   中英

如何通过 Java 以编程方式在 azure 搜索中设置认知搜索功能(使用 OCR)?

[英]How to set up cognitive search capabilities(with OCR) in azure search programmatically through Java?

我想在我的应用程序中提供全文搜索功能,因此我尝试使用认知搜索功能配置 Azure 搜索,以便我可以索引存储在 Azure Blob 存储中的图像和非图像文档。 However, while configuring Azure Search through Java code using Azure Search's REST APIs, i am not able to leverage OCR capabilities into Azure Search and the image documents are not getting indexed. I am missing some configuration details while configuring Azure search through Java code(using Azure Search REST APIs).

案例 1:从 Azure 门户,我能够

  1. 使用认知功能(包括 OCR 技能组)、索引、索引器和 Azure Blob 存储配置 Azure 搜索。
  2. 索引图像和非图像文档,例如 pdf、png、jpg、xls 等。
  3. 搜索索引文档

案例 2:从 Java 代码使用 Azure REST API,我能够

  1. 使用认知功能、索引、索引器和 Azure Blob 存储配置 Azure 搜索。
  2. 索引pdf、xls等非图像文档。
  3. To search the indexed documents However, while configuring Azure Search through Java code using Azure Search's REST APIs(in case 2), i am not able to leverage OCR capabilities into Azure Search and the image documents are not getting indexed. I am missing some configuration details while configuring Azure search through Java code(using Azure Search REST APIs).

I am using following sample Azure Search Rest API's from Java code 1. https://%s.search.windows.net/datasources?api-version=%s 2. https://%s.search.windows.net/技能集/cog-search-demo-ss?api-version=%s 3. https://%s.search.windows.net/indexes/%s?api-version=%s 4. https://%s .search.windows.net/indexers?api-version=%s

配置jsons:1.datasource.json

{
   "name" : "csstoragetest",
    "type" : "azureblob",
    "credentials" : { "connectionString" : "connectionString" },
    "container" : { "name" : "csblob" }
}
  1. 技能组.json
{
   "description": "Extract text from images and merge with content text to produce merged_text",
  "skills":
  [
    {
      "description": "Extract text (plain and structured) from image.",
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "context": "/document/normalized_images/*",
      "defaultLanguageCode": "null",
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*"
        }
      ],
      "outputs": [
        {
          "name": "text",
          "targetName": "myText"
        },
        {
          "name": "layoutText",
          "targetName": "myLayoutText"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name":"text", "source": "/document/content"
        },
        {
          "name": "itemsToInsert", "source": "/document/normalized_images/*/text"
        },
        {
          "name":"offsets", "source": "/document/normalized_images/*/contentOffset"
        }
      ],
      "outputs": [
        {
          "name": "mergedText", "targetName" : "merged_text"
        }
      ]
    }
  ]
}
  1. 索引.json
{
  "name": "azureblob-indexing",
  "fields": [
    { "name": "id", "type": "Edm.String", "key": true, "searchable": false },
    { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
  ]
}
  1. 索引器.json
{
    "name" : "azureblob-indexing1",
  "dataSourceName" : "csstoragetest",
  "targetIndexName" : "azureblob-indexing",
  "schedule" : { "interval" : "PT2H" },
  "skillsetName" : "cog-search-demo-ss",
  "parameters":
  {
    "maxFailedItems":-1,
    "maxFailedItemsPerBatch":-1,
    "configuration":
    {
      "dataToExtract": "contentAndMetadata",
      "imageAction":"generateNormalizedImages",
      "parsingMode": "default",
      "firstLineContainsHeaders": false,
      "delimitedTextDelimiter": ","
    }
  }
}

通过 java 代码配置 Azure 搜索后,图像文档应该在 azure 搜索中被索引,我应该能够根据其中包含的文本进行搜索。

尝试将默认语言代码设置为 null ,而无需在技能组中使用引号。json :

"defaultLanguageCode": null

我已经弄清楚了自己需要的配置。 如上所述(在问题中),它需要匹配案例 1 和 2 之间的所有参数,然后更新配置 json。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM