简体   繁体   English

使用nGram进行Elasticsearch自动完成,结果顺序

[英]Elasticsearch autocomplete using nGram, results order

I implement autocomplete with the nGram filter, and everything works fine. 我使用nGram过滤器实现了自动完成功能,并且一切正常。

my problem is the suggestions returned seem to be in arbitrary order. 我的问题是返回的建议似乎是任意顺序的。

for example, I have a field called "id", they seem to be some numbers like "1000", "45100231", but are stored as string. 例如,我有一个名为“ id”的字段,它们似乎是一些数字,例如“ 1000”,“ 45100231”,但存储为字符串。 when I type in "10", I hope to see "1000" comes first then maybe "102000", etc. so the ideal suggestion order I want is: the matching part in prefix comes first, then the middle, then suffix. 当我输入“ 10”时,我希望先看到“ 1000”,然后可能是“ 102000”,以此类推。所以我想要的理想建议顺序是:前缀中的匹配部分首先出现,然后是中间部分,然后是后缀。 eg "1000">"2101">"1110". 例如“ 1000”>“ 2101”>“ 1110”。 If the matching parts are all in the beginning, just sort by the next digits. 如果匹配的部分都是开头,则按下一位数字排序。 eg "1000" > "1011" >"10200" 例如“ 1000”>“ 1011”>“ 10200”

I've been reading lots of posts about elasticsearch sorting but found no strategy that really works. 我读过很多有关Elasticsearch排序的文章,但没有找到真正有效的策略。 anyone got any idea? 有人知道吗? thanks! 谢谢!

One way I see is keep autocomplete tokens in 3 fields: 1st field keeps prefixes (using edgeNgram) 2nd field keeps only middle word ngram parts (but I think this requires custom filter) 3nd field keeps only suffixes 我看到的一种方法是在3个字段中保留自动完成标记:第一个字段保留前缀(使用edgeNgram)第二个字段仅保留中间单词ngram部分(但我认为这需要自定义过滤器)第三个字段仅保留后缀

so for a value 12345 it generates next set of tokens: 因此,对于值12345它会生成下一组令牌:

  • prefixes: 12, 123, 1234, 12345 前缀: 12, 123, 1234, 12345
  • middle: 23, 34, 234 中: 23, 34, 234
  • suffixes: 2345, 345, 45 后缀: 2345, 345, 45

when you have such index, you could use bool filter with matching against this 3 fields, but with different boost factor, for example you boost prefixes ^10, middle ^1 and suffixes ^0.1 当您有这样的索引时,可以使用布尔过滤器来匹配这3个字段,但是具有不同的提升因子,例如,您可以提升前缀^ 10,中间^ 1和后缀^ 0.1

I believe the result must be acceptable. 我相信结果一定可以接受。

UPDATE 更新

for you case only with numbers, I think it's better to use script_score from this http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-function-score-query.html and manually check in mvel or javascript, if it's prefix, middle or suffix, but you should keep just raw_id in separate field. 对于仅以数字script_score情况,我认为最好使用http://www.elasticsearch.org/guide/zh-CN/elasticsearch/reference/0.90/query-dsl-function-score-query.html中的 script_score并手动进行检查在mvel或javascript中,如果是前缀,中间或后缀,则应仅在单独的字段中保留raw_id。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM