![](/img/trans.png)
[英]ElasticSearch provides different TF-IDF scores within a same index
[英]Elasticsearch - search wildcards (contains in strings) and tf-idf scores
如何制作搜索通配符和 tf-idf 分數。 例如當我這樣搜索時,
GET /test_es/_search?explain=true // return idf / dt scores
{
"explain":true,
"query": {
"query_string": {
"query": "bar^5",
"fields" : ["field"]
}
}
}
它返回 idf 和 td 分數,但是當我使用通配符(包含)進行搜索時。
GET /test_es/_search?explain=true // NOT RETURN idf/td score
{
"explain":true,
"query": {
"query_string": {
"query": "b*",
"fields" : ["field"]
}
}
}
如何使用通配符進行搜索(在字符串中使用 contains)並包含 IDF-TD 分數?
例如,當我這樣搜索時,我有 3 個文檔“ foo ”、“ foo bar ”、“ foo baz ”
GET /foo2/_search?explain=true
{
"explain":true,
"query": {
"query_string": {
"query": "fo *",
"fields" : ["field"]
}
}
}
Elasticsearch 結果
"hits" : [
{
"_shard" : "[foo2][0]",
"_node" : "z8bjI0T1T8Oq6Z2OwFyIKw",
"_index" : "foo2",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"field" : "foo bar"
},
"_explanation" : {
"value" : 1.0,
"description" : "sum of:",
"details" : [
{
"value" : 1.0,
"description" : "*:*",
"details" : [ ]
}
]
}
},
{
"_shard" : "[foo2][0]",
"_node" : "z8bjI0T1T8Oq6Z2OwFyIKw",
"_index" : "foo2",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"field" : "foo"
},
"_explanation" : {
"value" : 1.0,
"description" : "sum of:",
"details" : [
{
"value" : 1.0,
"description" : "*:*",
"details" : [ ]
}
]
}
},
{
"_shard" : "[foo2][0]",
"_node" : "z8bjI0T1T8Oq6Z2OwFyIKw",
"_index" : "foo2",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"field" : "foo baz"
},
"_explanation" : {
"value" : 1.0,
"description" : "sum of:",
"details" : [
{
"value" : 1.0,
"description" : "*:*",
"details" : [ ]
}
]
}
}
]
但我希望“foo”應該是第一個得分最高的結果,因為它匹配 %100,我錯了嗎?
由於您沒有提及您所獲取的數據,因此我對以下數據進行了索引:
指數數據:
{
"message": "A fox is a wild animal."
}
{
"message": "That fox must have killed the hen."
}
{
"message": "the quick brown fox jumps over the lazy dog"
}
搜索查詢:
GET/{{index-name}}/_search?explain=true
{
"query": {
"query_string": {
"fields": [
"message" ---> You can add more fields here
],
"query": "quick^2 fox*"
}
}
}
上面的查詢搜索了所有包含fox
的文檔,但是這里由於boost應用於quick
,所以包含quick fox
的文檔與其他文檔相比得分會更高。
此查詢將返回 tf-IDF 分數。 boost 運算符用於使一個術語比另一個術語更相關。
要了解有關此的更多信息,請參閱有關dsl-query-string 中“Boosting”的官方文檔
要了解更多關於 tf-IDF 算法的信息,您可以參考此博客
如果要跨多個領域進行搜索,可以提高某個領域的分數
更新 1:
指數數據:
{
"title": "foo bar"
}
{
"title": "foo baz"
}
{
"title": "foo"
}
搜索查詢:
{
"query": {
"query_string": {
"query": "foo *" --> You can just add a space between
foo and *
}
}
}
搜索結果:
"hits": [
{
"_index": "foo2",
"_type": "_doc",
"_id": "1",
"_score": 1.9808292, --> foo matches exactly, so the
score is maximum
"_source": {
"title": "foo"
}
},
{
"_index": "foo2",
"_type": "_doc",
"_id": "2",
"_score": 1.1234324,
"_source": {
"title": "foo bar"
}
},
{
"_index": "foo2",
"_type": "_doc",
"_id": "3",
"_score": 1.1234324,
"_source": {
"title": "foo baz"
}
}
]
更新 2:
通配符查詢基本上屬於術語級別的查詢,默認情況下使用constant_score_boolean方法來匹配術語。
通過更改rewrite 參數的值,您可以影響搜索性能和相關性。 它有多種評分選項,您可以根據需要選擇其中任何一種。
但根據您的用例,您也可以使用 edge_ngram 過濾器。 Edge N-Grams 對於搜索即鍵入查詢很有用。 要了解有關此內容和下面使用的映射的更多信息,請參閱此官方文檔
索引映射:
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase"
]
},
"autocomplete_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
}
}
}
}
索引樣本數據:
{ "title":"foo" }
{ "title":"foo bar" }
{ "title":"foo baz" }
搜索查詢:
{
"query": {
"match": {
"title": {
"query": "fo"
}
}
}
}
搜索結果:
"hits": [
{
"_index": "foo6",
"_type": "_doc",
"_id": "1",
"_score": 0.15965709, --> Maximum score
"_source": {
"title": "foo"
}
},
{
"_index": "foo6",
"_type": "_doc",
"_id": "2",
"_score": 0.12343237,
"_source": {
"title": "foo bar"
}
},
{
"_index": "foo6",
"_type": "_doc",
"_id": "3",
"_score": 0.12343237,
"_source": {
"title": "foo baz"
}
}
]
要了解更多關於在 Elasticsearch 中使用 Ngram 的基礎知識,您可以參考這里
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.