簡體   English   中英

如何通過 Java 以編程方式在 azure 搜索中設置認知搜索功能(使用 OCR)?

[英]How to set up cognitive search capabilities(with OCR) in azure search programmatically through Java?

我想在我的應用程序中提供全文搜索功能,因此我嘗試使用認知搜索功能配置 Azure 搜索,以便我可以索引存儲在 Azure Blob 存儲中的圖像和非圖像文檔。 However, while configuring Azure Search through Java code using Azure Search's REST APIs, i am not able to leverage OCR capabilities into Azure Search and the image documents are not getting indexed. I am missing some configuration details while configuring Azure search through Java code(using Azure Search REST APIs).

案例 1:從 Azure 門戶,我能夠

  1. 使用認知功能(包括 OCR 技能組)、索引、索引器和 Azure Blob 存儲配置 Azure 搜索。
  2. 索引圖像和非圖像文檔,例如 pdf、png、jpg、xls 等。
  3. 搜索索引文檔

案例 2:從 Java 代碼使用 Azure REST API,我能夠

  1. 使用認知功能、索引、索引器和 Azure Blob 存儲配置 Azure 搜索。
  2. 索引pdf、xls等非圖像文檔。
  3. To search the indexed documents However, while configuring Azure Search through Java code using Azure Search's REST APIs(in case 2), i am not able to leverage OCR capabilities into Azure Search and the image documents are not getting indexed. I am missing some configuration details while configuring Azure search through Java code(using Azure Search REST APIs).

I am using following sample Azure Search Rest API's from Java code 1. https://%s.search.windows.net/datasources?api-version=%s 2. https://%s.search.windows.net/技能集/cog-search-demo-ss?api-version=%s 3. https://%s.search.windows.net/indexes/%s?api-version=%s 4. https://%s .search.windows.net/indexers?api-version=%s

配置jsons:1.datasource.json

{
   "name" : "csstoragetest",
    "type" : "azureblob",
    "credentials" : { "connectionString" : "connectionString" },
    "container" : { "name" : "csblob" }
}
  1. 技能組.json
{
   "description": "Extract text from images and merge with content text to produce merged_text",
  "skills":
  [
    {
      "description": "Extract text (plain and structured) from image.",
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "context": "/document/normalized_images/*",
      "defaultLanguageCode": "null",
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*"
        }
      ],
      "outputs": [
        {
          "name": "text",
          "targetName": "myText"
        },
        {
          "name": "layoutText",
          "targetName": "myLayoutText"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name":"text", "source": "/document/content"
        },
        {
          "name": "itemsToInsert", "source": "/document/normalized_images/*/text"
        },
        {
          "name":"offsets", "source": "/document/normalized_images/*/contentOffset"
        }
      ],
      "outputs": [
        {
          "name": "mergedText", "targetName" : "merged_text"
        }
      ]
    }
  ]
}
  1. 索引.json
{
  "name": "azureblob-indexing",
  "fields": [
    { "name": "id", "type": "Edm.String", "key": true, "searchable": false },
    { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
  ]
}
  1. 索引器.json
{
    "name" : "azureblob-indexing1",
  "dataSourceName" : "csstoragetest",
  "targetIndexName" : "azureblob-indexing",
  "schedule" : { "interval" : "PT2H" },
  "skillsetName" : "cog-search-demo-ss",
  "parameters":
  {
    "maxFailedItems":-1,
    "maxFailedItemsPerBatch":-1,
    "configuration":
    {
      "dataToExtract": "contentAndMetadata",
      "imageAction":"generateNormalizedImages",
      "parsingMode": "default",
      "firstLineContainsHeaders": false,
      "delimitedTextDelimiter": ","
    }
  }
}

通過 java 代碼配置 Azure 搜索后,圖像文檔應該在 azure 搜索中被索引,我應該能夠根據其中包含的文本進行搜索。

嘗試將默認語言代碼設置為 null ,而無需在技能組中使用引號。json :

"defaultLanguageCode": null

我已經弄清楚了自己需要的配置。 如上所述(在問題中),它需要匹配案例 1 和 2 之間的所有參數,然后更新配置 json。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM