彈性搜索完成類型中的特殊字符

Question

我是彈性搜索的新用戶，我有一個映射： -

curl -X PUT localhost:9200/vee_trade -d '
{
 "mappings": {
  "sDocument" : {
   "properties" : {
    "id" : { "type" : "long" },
    "docId" : { "type" : "string" },
    "documentType" : { "type" : "string" },
    "rating"  : { "type" : "float" },
    "suggestion" : { "type" :     "completion"}
    }
   }
  }
}

一個樣本數據是： -

 _index: "test"
 _type: "sDocument"
 _id: "CATEGORY7"
 _score: 1
 _source{}
 docId: "CATEGORY7"
 documentType: "CATEGORY"
 id: 7
 suggestion[]
 "Kids's wear"
 rating: null

基本上我的目標是啟用自動建議，這適用於查詢，但在自動建議條目中我只獲得術語和分數值，而我也想要其他字段值，所以我再次在建議字段上觸發匹配查詢與結果自動建議條款

{
  "query" : {
   "match" : {
    "suggestion" : "Men's"  
    }
   }
}

但我沒有得到數據彈性從術語中刪除特殊字符看起來像（不知道它如何存儲和索引它）所以請告訴我

如何在auto建議中檢索其他字段值以及搜索詞？ 或如何使匹配查詢工作???

提前致謝。

Answer 1

警告：答案很長。 有點難以准確地說出你發布的問題是什么，所以我給你幾個選擇，可以幫助你解決問題。

您可以通過幾種不同的方式了解您的目標。 我在Qbox博客上寫過兩種不同的自動完成方法，一篇關於使用完成建議的文章，另一篇關於使用涉及ngrams和多個字段的更復雜設置的文章。

我發現完成建議在實踐中有點笨拙（因為你必須明確告訴它應該做什么），所以我傾向於更多地依賴於自定義分析框架。 您可以嘗試使用分析器的一種方法是為屬性設置多個子字段（以前稱為多字段）。 所以我將在下面展示幾個例子。

我將設置一個包含幾個子字段的字段，以不同的方式分析文本，然后在每個字段上使用match查詢來顯示它的行為方式。

看看這個：

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "analysis": {
         "filter": {
            "nGram_filter": {
               "type": "nGram",
               "min_gram": 2,
               "max_gram": 20,
               "token_chars": [
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
         "analyzer": {
            "nGram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "nGram_filter"
               ]
            },
            "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "properties": {
            "text_field": {
               "type": "string",
               "index_analyzer": "standard",
               "search_analyzer": "standard",
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  },
                  "ngram": {
                     "type": "string",
                     "index_analyzer": "nGram_analyzer",
                     "search_analyzer": "whitespace_analyzer"
                  }
               }
            }
         }
      }
   }
}

這里有很多內容，我鼓勵您閱讀分析和ngrams 。 另外，我從部分單詞自動完成帖子中獲取了部分代碼，因此您可能會發現閱讀它有助於更詳盡的解釋。

基本上，我有一個字段"text_field" ，使用"standard"分析器進行分析，用於索引（即，在創建倒排索引時為給定文檔和字段生成的術語）和搜索（搜索短語分解為與倒排索引中的術語匹配的術語的方式。 然后我在該字段上有兩個不同的子字段。 一個根本沒有分析，因此我們可以匹配的唯一術語將是文檔字段的原始文本。 使用"nGram_analyzer"進行索引分析第二個子字段，使用"whitespace_analyzer"進行搜索，這兩個子字段都在索引的"settings"中定義。

所以現在如果我們索引幾個文檔：

PUT /test_index/doc/1
{
    "text_field": "Kid's wear"
}

PUT /test_index/doc/2
{
    "text_field": "Men's wear"
}

我們可以通過各種方式搜索它們。

查詢"text_field.raw"將需要完整的完整文本才能獲得匹配：

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field.raw": "Men's wear"
      }
   }
}
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 1,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}

對"text_field"的標准"match"查詢按預期工作，因為術語"Men's"將在索引和搜索時被標記為"men" ：

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field": "Men's"
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.625,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.625,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}

但如果我們添加第二個術語，我們會得到可能不是我們想要的結果：

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field": "Men's wear"
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.72711754,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.72711754,
            "_source": {
               "text_field": "Men's wear"
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.09494676,
            "_source": {
               "text_field": "Kid's wear"
            }
         }
      ]
   }
}

這是因為生成術語的方式，並且因為匹配查詢的默認運算符是"or" 。 我們可以通過指定匹配查詢用作"and"的運算符來限制結果：

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field": {
             "query":  "Men's wear",
             "operator": "and"
         }
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.72711754,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.72711754,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}

我們可以使用"text_field.ngram"字段來匹配部分單詞（包括符號和標點符號，因為我們的索引設置中的"nGram_filter"定義中指定了這"nGram_filter" ）：

POST /test_index/doc/_search
{
   "query": {
      "match": {
         "text_field.ngram": {
             "query":  "men's we",
             "operator": "and"
         }
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.72711754,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.72711754,
            "_source": {
               "text_field": "Men's wear"
            }
         }
      ]
   }
}

希望這會給你一些關於如何進行的想法。

彈性搜索完成類型中的特殊字符

問題描述

1 個解決方案

解決方案1
0 2015-03-01 17:56:20

彈性搜索完成類型中的特殊字符

問題描述

1 個解決方案

解決方案1 0 2015-03-01 17:56:20

解決方案1
0 2015-03-01 17:56:20