繁体   English   中英

Elasticsearch:首选前缀匹配而不是术语匹配

[英]Elasticsearch: prefer prefix match over term match

我在 elasticsearch 索引中有一个字段,我正在尝试搜索,我希望该字段的值以搜索词开头的文档高于文档,其中该词位于中间的长句。 例如:搜索“lorem”时,

{
  "title": "Lorem"
}

分数应该高于

{
  "title": "The time I said Lorem"
}

或者

{
  "title": "The Lorem"
}

甚至

{
  "title": "Lorem impsum"
}

然而,简单的matchmatch_phrase_prefixquery_string查询通常不是这种情况。

到目前为止,我已经尝试在提升前缀的同时将prefix查询与match查询结合起来,但提升似乎并没有像我预期的那样工作,即结果是相同的,只是提升了 10

...
{
    "should": [
        {
            "prefix": {
                "title": {
                    "value": query,
                    "boost": 10
                }
            }
        }
        {
            "match": {
                "title": {
                    "query":     query,
                    "boost":     3,
                    "fuzziness": "AUTO"
                }
            }
        }
    ]
}
...

此外,不确定这是否相关,但title字段实际上是嵌套的,即它是alternative_names.title

elasticsearch 有什么优雅的解决方案吗?

您可以使用组合bool/should子句来实现所需的结果。

添加一个工作示例

索引映射:

{
  "mappings": {
    "properties": {
      "alternative_names": {
        "type": "nested",
        "properties": {
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}

指数数据:

{
  "alternative_names": {
    "title": "Lorem"
  }
}
{
  "alternative_names": {
    "title": "The time I said Lorem"
  }
}
{
  "alternative_names": {
    "title": "The Lorem"
  }
}
{
  "alternative_names": {
    "title": "Lorem impsum"
  }
}

搜索查询:

{
  "query": {
    "nested": {
      "path": "alternative_names",
      "query": {
        "bool": {
          "should": [
            {
              "term": {
                "alternative_names.title.keyword": "Lorem"
              }
            },
            {
              "match": {
                "alternative_names.title": "Lorem"
              }
            }
          ]
        }
      }
    }
  }
}

搜索结果:

"hits": [
      {
        "_index": "66500753",
        "_type": "_doc",
        "_id": "1", 
        "_score": 1.3436072,
        "_source": {
          "alternative_names": {          // note this
            "title": "Lorem"
          }
        }
      },
      {
        "_index": "66500753",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.11474907,
        "_source": {
          "alternative_names": {
            "title": "Lorem impsum"
          }
        }
      },
      {
        "_index": "66500753",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.11474907,
        "_source": {
          "alternative_names": {
            "title": "The Lorem"
          }
        }
      },
      {
        "_index": "66500753",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.07477197,
        "_source": {
          "alternative_names": {
            "title": "The time I said Lorem"
          }
        }
      }
    ]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM