Elasticsearch：按字段值編號操作分數

Question

我使用 Elastic 搜索 pdf。 pdf 內容之外的字段之一是 doridat，日期為 integer。 最新的文檔應該獲得更高的分數（更高的排名）。 這意味着 doridat 字段中的值越高，分數應該越高。 只有在 attachment.content 和 doridat 中的搜索結果才會影響分數。

如何強制評分整合字段（doridat）值？

我的查詢：

 GET /attachments/_search { "size": 2, "from": 0, "query": { "wildcard": { "attachment.content": { "value": "*berg*", "rewrite": "scoring_boolean" } } }, "highlight":{ "fields":{ "attachment.content":{} } }, "_source": { "excludes": "attachment.content" } }

我的映射：

 { "attachments": { "mappings": { "properties": { "attachment": { "properties": { "author": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "content": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "content_length": { "type": "long" }, "content_type": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "creator_tool": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "date": { "type": "date" }, "description": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "detect_language": { "type": "boolean" }, "format": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "indexed_chars": { "type": "long" }, "keywords": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "language": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "metadata_date": { "type": "date" }, "modified": { "type": "date" }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } }, "content": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "daname": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "do__nr": { "type": "integer" }, "do_typ": { "type": "integer" }, "doext": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "doname": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "donr": { "type": "integer" }, "doridat": { "type": "integer" }, "dowww": { "type": "integer" }, "id": { "type": "integer" }, "path": { "type": "text", "analyzer": "windows_path_hierarchy_analyzer" } } } } }

Answer 1

我認為通配符總是返回1.0進行匹配（即使匹配不止一次）。

排名功能看起來很適合您的用例。 您需要復制doridat字段並使用rank_feature字段類型對其進行索引。 您將能夠在Rank 功能查詢中使用該字段。 你用的是什么Elasticsearch版本？

另一種選擇是使用Script score query 。 您基本上可以在腳本中返回doridat ，因為通配符總是返回1.0作為分數。 您可以對attachment.content使用N-gram 標記器來實現類似查詢的通配符。 當您使用match而不是wildcard時，它會更好地獲得匹配。

排序功能的文檔 state 具有更好的性能（搜索時可以跳過文檔）。

Elasticsearch：按字段值編號操作分數

問題描述

1 個解決方案

解決方案1
0 2022-09-02 16:47:41

Elasticsearch：按字段值編號操作分數

問題描述

1 個解決方案

解決方案1 0 2022-09-02 16:47:41

解決方案1
0 2022-09-02 16:47:41