[英]Elastic Search error : Custom Analyzer [custom_analyzer] failed to find tokenizer under name [my_tokenizer]
[英]Elastic search cross fields, edge ngram analyzer
我有999個用於彈性搜索實驗的文檔。
我的類型映射中有一個字段f4,該字段經過分析,並為分析器進行了以下設置:
"myNGramAnalyzer" => [
"type" => "custom",
"char_filter" => ["html_strip"],
"tokenizer" => "standard",
"filter" => ["lowercase","standard","asciifolding","stop","snowball","ngram_filter"]
]
我的過濾器如下:
"filter" => [
"ngram_filter" => [
"type" => "edgeNGram",
"min_gram" => "2",
"max_gram" => "20"
]
]
我對字段f4的值為“ Proj1”,“ Proj2”,“ Proj3” ......等等。
現在,當我嘗試使用交叉字段搜索“ proj1”字符串時,我期望帶有“ Proj1”的文檔將以最大得分返回到響應的頂部。 但事實並非如此。 其余所有數據的內容幾乎相同。
另外我不明白為什么它匹配所有999文檔?
以下是我的搜索:
{
"index": "myindex",
"type": "mytype",
"body": {
"query": {
"multi_match": {
"query": "proj1",
"type": "cross_fields",
"operator": "and",
"fields": "f*"
}
},
"filter": {
"term": {
"deleted": "0"
}
}
}
}
我的搜索結果是:
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 999,
"max_score": 1,
"hits": [{
"_index": "myindex",
"_type": "mytype",
"_id": "42",
"_score": 1,
"_source": {
"f1": "396","f2": "125650","f3": "BH.1511AI.001",
"f4": "Proj42",
"f5": "BH.1511AI.001","f6": "","f7": "","f8": "","f9": "","f10": "","f11": "","f12": "","f13": "","f14": "","f15": "","f16": "09/05/16 | 01:02PM | User","deleted": "0"
}
}, {
"_index": "myindex",
"_type": "mytype",
"_id": "47",
"_score": 1,
"_source": {
"f1": "396","f2": "137946","f3": "BH.152096.001",
"f4": "Proj47",
"f5": "BH.1511AI.001","f6": "","f7": "","f8": "","f9": "","f10": "","f11": "","f12": "","f13": "","f14": "","f15": "","f16": "09/05/16 | 01:02PM | User","deleted": "0"
}
},
//.......
//.......
//MANY RECORDS IN BETWEEN HERE
//.......
//.......
{
"_index": myindex,
"_type": "mytype",
"_id": "1",
"_score": 1,
"_source": {
"f1": "396","f2": "142095","f3": "BH.705215.001",
"f4": "Proj1",
"f5": "BH.1511AI.001","f6": "","f7": "","f8": "","f9": "","f10": "","f11": "","f12": "","f13": "","f14": "","f15": "","f16": "09/05/16 | 01:02PM | User","deleted": "0"
}
//.......
//.......
//MANY RECORDS IN BETWEEN HERE
//.......
//.......
}]
}
}
我做錯了什么還是想念什么? (對於冗長的問題,我們深表歉意,但我認為應該提供所有可能的信息,並丟棄不必要的其他代碼)。
編輯:
術語向量響應
{
"_index": "myindex",
"_type": "mytype",
"_id": "10",
"_version": 1,
"found": true,
"took": 9,
"term_vectors": {
"f4": {
"field_statistics": {
"sum_doc_freq": 5886,
"doc_count": 999,
"sum_ttf": 5886
},
"terms": {
"pr": {
"doc_freq": 999,
"ttf": 999,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"pro": {
"doc_freq": 999,
"ttf": 999,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"proj": {
"doc_freq": 999,
"ttf": 999,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"proj1": {
"doc_freq": 111,
"ttf": 111,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"proj10": {
"doc_freq": 11,
"ttf": 11,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
}
}
}
}
}
編輯2
字段f4的映射
"f4" : {
"type" : "string",
"index_analyzer" : "myNGramAnalyzer",
"search_analyzer" : "standard"
}
我已更新為使用標准分析器查詢時間,這改善了結果,但仍達不到我的預期。
而不是999(所有文檔)現在返回111文檔,例如“ Proj1”,“ Proj11”,“ Proj111” ......“ Proj1”,“ Proj181” .........等。
仍然“ Proj1”位於結果之間,而不是頂部。
沒有index_analyzer
(至少不是從Elasticsearch
1.7版開始)。 對於映射參數 ,可以使用analyzer
和search_analyzer
。 請嘗試以下步驟以使其起作用。
使用分析器設置創建myindex:
PUT /myindex
{
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"myNGramAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": "html_strip",
"filter": [
"lowercase",
"standard",
"asciifolding",
"stop",
"snowball",
"ngram_filter"
]
}
}
}
}
}
將映射添加到mytype(為簡短起見,我僅映射了相關字段):
PUT /myindex/_mapping/mytype
{
"properties": {
"f1": {
"type": "string"
},
"f4": {
"type": "string",
"analyzer": "myNGramAnalyzer",
"search_analyzer": "standard"
},
"deleted": {
"type": "string"
}
}
}
索引一些數據:
PUT myindex/mytype/1
{
"f1":"396",
"f4":"Proj12" ,
"deleted": "0"
}
PUT myindex/mytype/2
{
"f1":"42",
"f4":"Proj22" ,
"deleted": "1"
}
現在嘗試查詢:
GET myindex/mytype/_search
{
"query": {
"multi_match": {
"query": "proj1",
"type": "cross_fields",
"operator": "and",
"fields": "f*"
}
},
"filter": {
"term": {
"deleted": "0"
}
}
}
它應該返回文檔#1
。 它對我Sense
。 我正在使用Elasticsearch 2.X
版本。
希望我能幫助到我:)
經過數小時的時間尋找解決方案之后,我終於使它工作了。
因此,我將所有內容與問題中提到的保持相同,在索引數據時使用n克分析儀。 我唯一需要更改的是將搜索查詢中的all
字段與現有的multi-match
查詢一起用作布爾查詢。
現在,我的搜索文本結果Proj1
將返回我結果的順序,例如Proj1
, Proj121
, Proj11
等。
雖然這並不像返回的確切順序Proj1
, Proj11
, Proj121
等,但它仍然非常類似於我想要的結果。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.