简体   繁体   中英

Multiple documents having equal search score in MongoDB Atlas Search

Is there a way to boost score for exact match in Atlas search?

I'm having issues getting the right/best translation for 'hi' from English to French. After some debugging I discovered that the first three(3) documents returned from my aggregation has the same score of '2.362138271331787' each.

I'm expecting 'hi' to have a higher score since it has an exact match with the same search query, but 'it's his' and 'his' seems to have the same score with 'hi'.

Here's my search query:

const searchOption= [
  {
    $search: {
      text: {
        query: 'hi',
        path: 'english',
      },
    },
  },
  { $project: {  _id: 0, french: 1, english: 1, score: { $meta: "searchScore" } } },
  { $limit: 5 },
];

const result = await Greetings.aggregate(searchOption, { cursor: { batchSize: 5 } }).toArray();

Here's are the documents returned. The list is ordered by search score:

[
  {
    "english": "it’s his",
    "french": "c'est le sien",
    "score": 2.362138271331787
  },
  {
    "english": "hi",
    "french": "salut",
    "score": 2.362138271331787
  },
  {
    "english": "his",
    "french": "le sien",
    "score": 2.362138271331787
  },
  {
    "english": "it’s his failure to arrange his",
    "french": "c'est son incapacité à organiser son",
    "score": 2.2482824325561523
  },
  {
    "english": "it’s his failure to arrange his time",
    "french": "c'est son incapacité à organiser son temps",
    "score": 2.0995540618896484
  }
]

The score is a "relevance score" implemented internally by Mongo, I will say it is surprising to me that field length is not part of the score even if it's a "text" operand, I would personally expect it to be added in some form in the near future.

For now you could use a workaround to construct the score you want, for example you could use a should (or) expressions with a phrase operator combined with a boost score function, like so:

const searchOption= [
    {
        $search: {
            "compound": {
                "should" : [
                    {
                        "phrase":{
                            "query": "hi",
                            "path": "english",
                            "score": {"boost":{"value":5}} 
                        }
                    },
                    {
                        text: {
                            query: 'hi',
                            path: 'english',
                        },
                    },
                ]
            }
        }
    },
    { $project: {  _id: 0, french: 1, english: 1, score: { $meta: "searchScore" } } },
    { $limit: 5 },
];

const result = await Greetings.aggregate(searchOption, { cursor: { batchSize: 5 } }).toArray();

Otherwise you could also just sort by english length combined with the score ( this is under the assumptions scores will be tied), obviously this is not a real sort as it assumes the top 5 results are the actual top 5 results you're expecting to get.

const searchOption= [
    {
        $search: {
            text: {
                query: 'hi',
                path: 'english',
            },
        },
    },
    { $project: {  _id: 0, french: 1, english: 1, score: { $meta: "searchScore" }, len: {$strLenCP: "$english"} } },
    { $sort : { score: -1, len: -1 } },
    { $limit: 5 },
];

const result = await Greetings.aggregate(searchOption, { cursor: { batchSize: 5 } }).toArray();

This is a known limitation of Atlas Search and the solution is mentioned here: https://www.mongodb.com/docs/atlas/atlas-search/autocomplete/#limitations

The lucene.keyword analyzer on a string type is so helpful for exact matches in scenarios where score fidelity is essential.

Basically, the path english should be defined in the index definition as both autocomplete and string, like:

[
  {"type": "string"},
  {"type": "autocomplete"}
]

The above assumes that you are not using a language analyzer for autocomplete or string, which is probably not ideal for string.

Then, on the query side, you want a compound query where both options are should clauses. You should boost the text clause and not the autocomplete clause.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM