簡體   English   中英

elasticsearch中的多字段文本和關鍵字字段

[英]Multi field text and keyword fields in elasticsearch

我正在考慮從solr切換到elasticsearch並將一堆文檔編入其中而不提供模式/映射,並且我之前在solr中設置為索引字符串的許多字段已被設置為textkeyword使用多字段的字段

關鍵字字段作為使用多字段文本字段有什么好處嗎? 在我的情況下,字段中的大多數值都是單個單詞,所以我想如果將它們發送到分析器並不重要但是es文檔似乎暗示在搜索時不考慮關鍵字字段或者至少采用不同的處理方式?

只是為了進一步擴展,如果我搜索術語“ipad”,如果在關鍵字字段中有“ipad”以及其他文本字段與沒有關鍵字字段的同一文檔,文檔得分會更高嗎? 如果說“ipad”僅在關鍵字字段中,那么文檔是否仍然匹配?

為了回答我自己的問題,我創建了一個快速測試,幾乎關鍵字和文本字段在搜索時是等效的,多字段似乎得到與其主要類型相同的分數,所以我猜第二個字段對搜索評分沒有影響

奇怪的是,關鍵字和文本字段中的多字值得到了相同的分數,我希望關鍵字字段得分較低或根本沒有,但為了我的目的,這很好,所以我不打算進一步調查。

索引創建

PUT test_index
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "test_type" : {
            "properties" : {
                "multifield": {
                  "type": "text",
                  "fields": {
                     "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                     }
                  }
                },

                "keywordfield": {
                  "type": "keyword"
                },

                "textfield": {
                  "type": "text"
                }

            }
        }
    }
}

數據插入

POST /_bulk
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 1 }
{ "doc" : { "multifield" : "ipad"  }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 2 }
{ "doc" : { "keywordfield" : "ipad"  }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 3 }
{ "doc" : { "keywordfield" : "a green ipad"  }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 4 }
{ "doc" : { "textfield" : "a yellow ipad"  }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 5 }
{ "doc" : { "keywordfield" : "ipad", "textfield" : "ipad"  }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 6 }
{ "doc" : { "keywordfield" : "unrelated", "textfield" : "hopefully this wont show up"  }, "doc_as_upsert" : true }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": 7 }
{ "doc" : { "textfield" : "ipad"  }, "doc_as_upsert" : true }

結果

GET /test_index/_search?q=ipad
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 6,
      "max_score": 0.28122374,
      "hits": [
         {
            "_index": "test_index",
            "_type": "test_type",
            "_id": "5",
            "_score": 0.28122374,
            "_source": {
               "keywordfield": "ipad",
               "textfield": "ipad"
            }
         },
         {
            "_index": "test_index",
            "_type": "test_type",
            "_id": "1",
            "_score": 0.2734406,
            "_source": {
               "multifield": "ipad"
            }
         },
         {
            "_index": "test_index",
            "_type": "test_type",
            "_id": "2",
            "_score": 0.2734406,
            "_source": {
               "keywordfield": "ipad"
            }
         },
         {
            "_index": "test_index",
            "_type": "test_type",
            "_id": "7",
            "_score": 0.2734406,
            "_source": {
               "textfield": "ipad"
            }
         },
         {
            "_index": "test_index",
            "_type": "test_type",
            "_id": "3",
            "_score": 0.16417998,
            "_source": {
               "keywordfield": "a green ipad"
            }
         },
         {
            "_index": "test_index",
            "_type": "test_type",
            "_id": "4",
            "_score": 0.16417998,
            "_source": {
               "textfield": "a yellow ipad"
            }
         }
      ]
   }
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM