[英]Elasticsearch fuzzy query - max edits doesn't work as expected
I have recently added "fuzzy operator" and fuzzy query settings to our search query string to cover user mistyping (eg "zamestnanost" vs. "zamestnani" ) 我最近在搜索查询字符串中添加了“模糊运算符”和模糊查询设置,以掩盖用户的错误信息(例如, “ zamestnanost”与“ zamestnani” )
POST /my_index/_search
{
"query": {
"query_string": {
"query": "+(content:zamestnanost~)",
"fuzzy_prefix_length": 3,
"fuzzy_min_sim": 0.5,
"fuzzy_max_expansions": 50
}
}
}
As I understand fuzzy query settings, the fuzzy_min_sim = 0.5
should allow length(query)*0.5
edits of original query (in this case 6
edits). 据我了解的模糊查询设置,
fuzzy_min_sim = 0.5
应该允许对原始查询进行length(query)*0.5
次编辑(在这种情况下为6
次编辑)。
However, it doesn't match even "closer" words (tokens) like 但是,它甚至不匹配“更紧密的”单词(标记),例如
I have this strange feeling, that it still matches only words from index that are max. 我有一种奇怪的感觉,它仍然只匹配索引中最大的单词。 2 edits from the original query string (which is the default edit count in fuzzy query).
从原始查询字符串进行2次编辑(这是模糊查询中的默认编辑计数)。
I have also ran an explain on my query and the results supports this hypothesis, I think. 我认为我也对查询进行了解释,结果支持了这一假设。 The
_explanation
looks like this: _explanation
看起来像这样:
"_explanation": {
"value": 0.057083897,
"description": "sum of:",
"details": [
{
"value": 0.023866946,
"description": "weight(content:zamestnano^0.8 in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.023866946,
"description": "score(doc=0,freq=4.0), product of:",
"details": [
{
"value": 0.66062796,
"description": "queryWeight, product of:",
"details": [
{
"value": 0.8,
"description": "boost"
},
{
"value": 4.624341,
"description": "idf(docFreq=1, maxDocs=75)"
},
{
"value": 0.17857353,
"description": "queryNorm"
}
]
},
{
"value": 0.036127664,
"description": "fieldWeight in 0, product of:",
"details": [
{
"value": 2,
"description": "tf(freq=4.0), with freq of:",
"details": [
{
"value": 4,
"description": "termFreq=4.0"
}
]
},
{
"value": 4.624341,
"description": "idf(docFreq=1, maxDocs=75)"
},
{
"value": 0.00390625,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
},
{
"value": 0.03321695,
"description": "weight(content:zamestnanos^0.9090909 in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.03321695,
"description": "score(doc=0,freq=6.0), product of:",
"details": [
{
"value": 0.7507135,
"description": "queryWeight, product of:",
"details": [
{
"value": 0.9090909,
"description": "boost"
},
{
"value": 4.624341,
"description": "idf(docFreq=1, maxDocs=75)"
},
{
"value": 0.17857353,
"description": "queryNorm"
}
]
},
{
"value": 0.044247173,
"description": "fieldWeight in 0, product of:",
"details": [
{
"value": 2.4494898,
"description": "tf(freq=6.0), with freq of:",
"details": [
{
"value": 6,
"description": "termFreq=6.0"
}
]
},
{
"value": 4.624341,
"description": "idf(docFreq=1, maxDocs=75)"
},
{
"value": 0.00390625,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
]
}
]
}
Only queries "zamestnano" and "zemestnanos" are created using fuzzy query edits. 使用模糊查询编辑仅创建查询“ zamestnano”和“ zemestnanos” 。
Do I understand the fuzzy query settings right? 我理解模糊查询设置正确吗? Could you please point out my mistake?
你能指出我的错误吗?
Thanks a lot for every idea! 非常感谢您的每一个想法!
From the documentation : 从文档中 :
0.0..1.0
0.0..1.0
[1.7.0] Deprecated in 1.7.0.
[1.7.0]在1.7.0中已弃用。 Support for similarity will be removed in Elasticsearch 2.0.
在Elasticsearch 2.0中将删除对相似性的支持。 converted into an edit distance using the formula: length(term) * (1.0 - fuzziness), eg a fuzziness of 0.6 with a term of length 10 would result in an edit distance of 4. Note: in all APIs except for the Fuzzy Like This Query, the maximum allowed edit distance is 2 .
使用以下公式将其转换为编辑距离:length(term)*(1.0-模糊度),例如,模糊度为0.6且长度为10的项将导致编辑距离为4。 注意:在所有API中,除了Fuzzy Like此查询,最大允许编辑距离为2 。
And the easiest way to double check this is to use the validate
API: 再次检查的最简单方法是使用
validate
API:
GET _validate/query?explain&index=my_index
{
"query": {
"query_string": {
"query": "+(content:zamestnanost~)",
"fuzzy_prefix_length": 3,
"fuzzy_min_sim": 0.5,
"fuzzy_max_expansions": 50
}
}
}
Which gives this result: 得到以下结果:
"explanations": [
{
"index": "test",
"valid": true,
"explanation": "+content:zamestnanost~2"
}
]
which shows the actual edit distance ES will use in the query: zamestnanost~2
. 该图显示ES将在查询中使用的实际编辑距离:
zamestnanost~2
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.