简体   繁体   English

elasticsearch:使用ngram分析器时避免重复计分

[英]elasticsearch : Avoid repetitive scoring when using ngram analyzer

Suppose I search for "hello" when the document contains "hello" and "hello hello" I want "hello" to have higher scoring. 假设我在文档包含“ hello”和“ hello hello”时搜索“ hello”,我希望“ hello”具有更高的评分。

I am using ngram index and search analyzer. 我正在使用ngram索引和搜索分析器。 (Because I really need this for other scenarios) So "hello hello" gets matched twice and hence shows as the top result. (因为在其他情况下我确实需要此功能),因此“ hello hello”被匹配两次,因此显示为最佳结果。 Is there any way I can avoid this? 有什么办法可以避免这种情况吗? I have already tried term query, match phrase query, multi match queries all of them scores "hello hello" higher. 我已经尝试过术语查询,匹配短语查询,多匹配查询,它们的得分都更高。

I solved this by adding a duplicate unanalyzed (keyword) column for the document and used bool clause to boost the term query. 我通过为文档添加重复的未分析(关键字)列来解决此问题,并使用bool子句来增强术语查询。

var res = client.Search<MyClass>(s => s
  .Query(q => q
    .Bool(
        b1 => b1.Should(
            s1 =>s1
            .Term(m=>m
                .Field(f => f._DUPLICATE_COLUMN)
                .Value("hello")
                .Boost(1)
            ),

            s1=>s1.Match(m => m
            .Field(f => f.MY_COLUMN)
            .Query("hello")
            .Analyzer("myNgramSearchAnalyzer")
            )
        )
        .MinimumShouldMatch(1)
    )
  )
);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM