简体   繁体   中英

How is a nested document relevance score (TF/IDF) calculated in Elasticsearch?

When running a match query on nested fields, are the relevance scores for each nested document calculated based on all nested documents across all root documents, or just the nested documents under a single root document? Basically when TF/IDF is calculated, what is the scope of the collection being used for IDF?

Here is a the nested document:

PUT /channels_index
{
  "mappings": {
    "channel": {
      "properties": {
        "username": { "type": "string" },
        "posts": {
          "type": "nested", 
          "properties": {
            "link":    { "type": "string" },
            "caption": { "type": "string" },
          }
        }
      }
    }
  }
}

And here is the query:

GET channels/_search
{
  "query": {
    "nested": {
      "path": "posts",
      "query": {
        "match": {
          "posts.caption": "adidas"
        }
      },
      "inner_hits": {}
    }
  }
}

However, in my results, even though the second document has a higher max score for inner hits, the first document's root score is somehow higher.

{
  "hits": {
    "total": 2,
    "max_score": 4.3327584,
    "hits": [
      {
        "_index": "channels",
        "_type": "channel",
        "_id": "1",
        "_score": 4.3327584,
        "_source": {
          "username": "user1",
          "posts": [...]
        },
        "inner_hits": {
          "posts": {
            "hits": {
              "total": 2,
              "max_score": 5.5447335,
              "hits": [...]
            }
          }
        }
      },
      {
        "_index": "channels",
        "_type": "channel",
        "_id": "2",
        "_score": 4.2954993,
        "_source": {
          "username": "user2",
          "posts": [...]
        },
        "inner_hits": {
          "posts": {
            "hits": {
              "total": 13,
              "max_score": 11.446381,
              "hits": [...]
            }
          }
        }
      }
    ]
  }
}

After running explain on my query I can see that the TF/IDF score for inner hits is indeed using an IDF calculated from nested documents across all root documents.

As to the root document scoring, the default score mode for nested documents is to average the score. If I want to use the max score of my nested documents I can set it by defining a score_mode. Query below shows how to run explain on a document as well as set a different score mode.

GET channels/channel/1/_explain
{
  "query": {
    "nested": {
      "path": "posts",
      "score_mode": "max", 
      "query": {
        "match": {
          "posts.caption": "adidas"
        }
      },
      "inner_hits": {}
    }
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM