[英]ElasticSearch provides different TF-IDF scores within a same index
[英]Elasticsearch - search wildcards (contains in strings) and tf-idf scores
如何制作搜索通配符和 tf-idf 分数。 例如当我这样搜索时,
GET /test_es/_search?explain=true // return idf / dt scores
{
"explain":true,
"query": {
"query_string": {
"query": "bar^5",
"fields" : ["field"]
}
}
}
它返回 idf 和 td 分数,但是当我使用通配符(包含)进行搜索时。
GET /test_es/_search?explain=true // NOT RETURN idf/td score
{
"explain":true,
"query": {
"query_string": {
"query": "b*",
"fields" : ["field"]
}
}
}
如何使用通配符进行搜索(在字符串中使用 contains)并包含 IDF-TD 分数?
例如,当我这样搜索时,我有 3 个文档“ foo ”、“ foo bar ”、“ foo baz ”
GET /foo2/_search?explain=true
{
"explain":true,
"query": {
"query_string": {
"query": "fo *",
"fields" : ["field"]
}
}
}
Elasticsearch 结果
"hits" : [
{
"_shard" : "[foo2][0]",
"_node" : "z8bjI0T1T8Oq6Z2OwFyIKw",
"_index" : "foo2",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"field" : "foo bar"
},
"_explanation" : {
"value" : 1.0,
"description" : "sum of:",
"details" : [
{
"value" : 1.0,
"description" : "*:*",
"details" : [ ]
}
]
}
},
{
"_shard" : "[foo2][0]",
"_node" : "z8bjI0T1T8Oq6Z2OwFyIKw",
"_index" : "foo2",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"field" : "foo"
},
"_explanation" : {
"value" : 1.0,
"description" : "sum of:",
"details" : [
{
"value" : 1.0,
"description" : "*:*",
"details" : [ ]
}
]
}
},
{
"_shard" : "[foo2][0]",
"_node" : "z8bjI0T1T8Oq6Z2OwFyIKw",
"_index" : "foo2",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"field" : "foo baz"
},
"_explanation" : {
"value" : 1.0,
"description" : "sum of:",
"details" : [
{
"value" : 1.0,
"description" : "*:*",
"details" : [ ]
}
]
}
}
]
但我希望“foo”应该是第一个得分最高的结果,因为它匹配 %100,我错了吗?
由于您没有提及您所获取的数据,因此我对以下数据进行了索引:
指数数据:
{
"message": "A fox is a wild animal."
}
{
"message": "That fox must have killed the hen."
}
{
"message": "the quick brown fox jumps over the lazy dog"
}
搜索查询:
GET/{{index-name}}/_search?explain=true
{
"query": {
"query_string": {
"fields": [
"message" ---> You can add more fields here
],
"query": "quick^2 fox*"
}
}
}
上面的查询搜索了所有包含fox
的文档,但是这里由于boost应用于quick
,所以包含quick fox
的文档与其他文档相比得分会更高。
此查询将返回 tf-IDF 分数。 boost 运算符用于使一个术语比另一个术语更相关。
要了解有关此的更多信息,请参阅有关dsl-query-string 中“Boosting”的官方文档
要了解更多关于 tf-IDF 算法的信息,您可以参考此博客
如果要跨多个领域进行搜索,可以提高某个领域的分数
更新 1:
指数数据:
{
"title": "foo bar"
}
{
"title": "foo baz"
}
{
"title": "foo"
}
搜索查询:
{
"query": {
"query_string": {
"query": "foo *" --> You can just add a space between
foo and *
}
}
}
搜索结果:
"hits": [
{
"_index": "foo2",
"_type": "_doc",
"_id": "1",
"_score": 1.9808292, --> foo matches exactly, so the
score is maximum
"_source": {
"title": "foo"
}
},
{
"_index": "foo2",
"_type": "_doc",
"_id": "2",
"_score": 1.1234324,
"_source": {
"title": "foo bar"
}
},
{
"_index": "foo2",
"_type": "_doc",
"_id": "3",
"_score": 1.1234324,
"_source": {
"title": "foo baz"
}
}
]
更新 2:
通配符查询基本上属于术语级别的查询,默认情况下使用constant_score_boolean方法来匹配术语。
通过更改rewrite 参数的值,您可以影响搜索性能和相关性。 它有多种评分选项,您可以根据需要选择其中任何一种。
但根据您的用例,您也可以使用 edge_ngram 过滤器。 Edge N-Grams 对于搜索即键入查询很有用。 要了解有关此内容和下面使用的映射的更多信息,请参阅此官方文档
索引映射:
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase"
]
},
"autocomplete_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
索引样本数据:
{ "title":"foo" }
{ "title":"foo bar" }
{ "title":"foo baz" }
搜索查询:
{
"query": {
"match": {
"title": {
"query": "fo"
}
}
}
}
搜索结果:
"hits": [
{
"_index": "foo6",
"_type": "_doc",
"_id": "1",
"_score": 0.15965709, --> Maximum score
"_source": {
"title": "foo"
}
},
{
"_index": "foo6",
"_type": "_doc",
"_id": "2",
"_score": 0.12343237,
"_source": {
"title": "foo bar"
}
},
{
"_index": "foo6",
"_type": "_doc",
"_id": "3",
"_score": 0.12343237,
"_source": {
"title": "foo baz"
}
}
]
要了解更多关于在 Elasticsearch 中使用 Ngram 的基础知识,您可以参考这里
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.