使用彈性搜索從文本中提取關鍵字（多字）

Question

我有一個充滿關鍵字的索引，並根據這些關鍵字我想從輸入文本中提取關鍵字。

以下是示例關鍵字索引。 請注意，關鍵字也可以是多個單詞，或者基本上它們是唯一的標簽。

{
  "hits": {
    "total": 2000,
    "hits": [
      {
        "id": 1,
        "keyword": "thousand eyes"
      },
      {
        "id": 2,
        "keyword": "facebook"
      },
      {
        "id": 3,
        "keyword": "superdoc"
      },
      {
        "id": 4,
        "keyword": "quora"
      },
      {
        "id": 5,
        "keyword": "your story"
      },
      {
        "id": 6,
        "keyword": "Surgery"
      },
      {
        "id": 7,
        "keyword": "lending club"
      },
      {
        "id": 8,
        "keyword": "ad roll"
      },
      {
        "id": 9,
        "keyword": "the honest company"
      },
      {
        "id": 10,
        "keyword": "Draft kings"
      }
    ]
  }
}

現在，如果我輸入文本為“我在facebook上看到了借閱俱樂部的新聞，你的故事和quora” ，搜索的輸出應該是[“借閱俱樂部”，“臉書”，“你的故事”，“quora”] 。此外，搜索應該是案例性的

Answer 1

只有一種方法可以做到這一點。 您必須將數據編入索引作為關鍵字並使用帶狀皰疹進行搜索：

看到這個復制品：

首先，我們將創建兩個自定義分析器：關鍵字和帶狀皰疹：

PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer_keyword": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "asciifolding",
            "lowercase"
          ]
        },
        "my_analyzer_shingle": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "asciifolding",
            "lowercase",
            "shingle"
          ]
        }
      }
    }
  },
  "mappings": {
    "your_type": {
      "properties": {
        "keyword": {
          "type": "string",
          "index_analyzer": "my_analyzer_keyword",
          "search_analyzer": "my_analyzer_shingle"
        }
      }
    }
  }
}

現在讓我們使用您提供的內容創建一些示例數據：

POST /test/your_type/1
{
  "id": 1,
  "keyword": "thousand eyes"
}
POST /test/your_type/2
{
  "id": 2,
  "keyword": "facebook"
}
POST /test/your_type/3
{
  "id": 3,
  "keyword": "superdoc"
}
POST /test/your_type/4
{
  "id": 4,
  "keyword": "quora"
}
POST /test/your_type/5
{
  "id": 5,
  "keyword": "your story"
}
POST /test/your_type/6
{
  "id": 6,
  "keyword": "Surgery"
}
POST /test/your_type/7
{
  "id": 7,
  "keyword": "lending club"
}
POST /test/your_type/8
{
  "id": 8,
  "keyword": "ad roll"
}
POST /test/your_type/9
{
  "id": 9,
  "keyword": "the honest company"
}
POST /test/your_type/10
{
  "id": 10,
  "keyword": "Draft kings"
}

最后查詢運行搜索：

POST /test/your_type/_search
{
  "query": {
    "match": {
      "keyword": "I saw the news of lending club on facebook, your story and quora"
    }
  }
}

這是結果：

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0.009332742,
    "hits": [
      {
        "_index": "test",
        "_type": "your_type",
        "_id": "2",
        "_score": 0.009332742,
        "_source": {
          "id": 2,
          "keyword": "facebook"
        }
      },
      {
        "_index": "test",
        "_type": "your_type",
        "_id": "7",
        "_score": 0.009332742,
        "_source": {
          "id": 7,
          "keyword": "lending club"
        }
      },
      {
        "_index": "test",
        "_type": "your_type",
        "_id": "4",
        "_score": 0.009207102,
        "_source": {
          "id": 4,
          "keyword": "quora"
        }
      },
      {
        "_index": "test",
        "_type": "your_type",
        "_id": "5",
        "_score": 0.0014755741,
        "_source": {
          "id": 5,
          "keyword": "your story"
        }
      }
    ]
  }
}

幕后它做了什么？

它將您的文檔編入索引作為整個關鍵字（它將整個字符串作為單個標記發出）。 我還添加了asciifolding過濾器，因此它標准化字母，即é變為e ）和小寫過濾器（不區分大小寫的搜索）。 因此，例如Draft kings被列為draft kings
現在搜索分析器使用相同的邏輯，除了它的'tokenizer發出單詞標記，並在其上創建帶狀符（標記組合），它將匹配您在第一步索引的關鍵字。

使用彈性搜索從文本中提取關鍵字（多字）

問題描述

1 個解決方案

解決方案1
7 已采納 2015-11-07 11:36:10

使用彈性搜索從文本中提取關鍵字（多字）

問題描述

1 個解決方案

解決方案1 7 已采納 2015-11-07 11:36:10

解決方案1
7 已采納 2015-11-07 11:36:10