简体   繁体   English

Solr-按最大匹配对结果进行排序,以对多值字段进行OR搜索

[英]Solr - sort results by maximum matches for OR search on multi-valued field

Let me try to explain my problem, let's assume I have a multi-valued field called "enrolment" in each document that contains name of students in it. 让我尝试解释我的问题,假设我在每个文档中都有一个包含“学生人数”的多值字段,其中包含学生的姓名。

Now while searching Solr, let's say I fire search for the names of three students - Manish, Amit, Navin. 现在,在搜索Solr时,假设我要搜索三个学生的姓名-Manish,Amit和Navin。 Now Solr returns all documents containing any one of these names (which is obviously desired in my case). 现在,Solr返回包含这些名称中的任何一个的所有文档(在我的情况下显然是需要的)。 Now some documents may have all 3 of them, or 2 of them or 1 of them. 现在,某些文档可能具有全部三个,两个或一个。 I want these documents/results sorted in an order such that document with maximum matching will be at the top, followed by lesser number of matches. 我希望这些文档/结果按顺序排序,以使匹配程度最高的文档位于顶部,然后是较少的匹配项。

I tried adding sort: score desc for this, but it doesn't work as desired because the score is "1" for all matching documents. 我尝试为此添加sort:score desc,但是由于所有匹配文档的得分均为“ 1”,因此无法按需工作。

How can I achieve the sort order by maximum number of matches for my multi-valued field? 如何通过多值字段的最大匹配数获得排序顺序?

Given a multivalued integer field where you want to rank the documents based on the number of matches, apply a boost query for each match. 给定一个多值整数字段,您想在其中根据匹配数对文档进行排名,请为每个匹配项应用增强查询。 For example, if you have a series of monitors that come in different sizes, you can apply a boost for each size that is valid (I hacked this together and tested it with the example docs from the tech core, so that's my example and I'm sticking with it). 例如,如果您有一系列尺寸不同的显示器,则可以对每个有效尺寸应用增强功能(我将其合并在一起,并使用了技术核心的示例文档进行了测试,因此这就是我的示例坚持下去)。 I have two relevant documents, one named VA902B with sizes given as a multi valued field with values 23, 28, and 32, and one named 3007WFP with values 23, 29, 36 in the same field. 我有两个相关文档,其中一个名为VA902B ,其sizes作为VA902B和32的多值字段给出,而另一个名称为3007WFP ,在同一字段中具有值VA902B

Here I'm asking for any document, but give me those that have both size 28 and size 23 at the top, and then those that have either size 28 or size 23, and then any other document: 在这里,我要的是任何文档,但给我的是顶部尺寸为28和23的文档,然后是尺寸为28或23的文档,以及其他任何文档:

?bq=sizes:28&bq=sizes:23&defType=edismax&q=*:*

If I want to limit the set of documents to only those that match either of the sizes, I can use that as my main query: 如果我想将文档集限制为仅与任何一种尺寸都匹配的文档,可以将其用作主要查询:

?defType=edismax&q=sizes:(23%2028)

.. and this is where I discover that your presumption that the score is the same regardless of the number of matches is false. ..这就是我发现您的假设,即无论比赛次数多寡,分数都是相同的。 Adding &debugQuery=true to the URL gives us detailed scoring information for each document: 在网址中添加&debugQuery=true可以为我们提供每个文档的详细评分信息:

"explain": {
  "VA902B": "\n2.0 = sum of:\n  1.0 = sizes:[23 TO 23]\n  1.0 = sizes:[28 TO 28]\n",
  "3007WFP": "\n1.0 = sum of:\n  1.0 = sizes:[23 TO 23]\n"
},    

.. which means that there is no need for applying a boost - the behaviour you want is the standard behaviour for Solr. ..这意味着无需应用增强功能-您想要的行为是Solr的标准行为。 This was my initial thought, but that should have given you the correct answer with the queries you gave in the comments. 这是我最初的想法,但是应该可以根据您在注释中提出的查询为您提供正确的答案。

But I'll show you how my strategy with applying boosts would have worked as well: 但是,我将向您展示我的应用提升策略的效果如何:

?bq=sizes:28&bq=sizes:23&defType=edismax&q=sizes:(23%2028)&debugQuery=true

.. which now tells us that the score for each document has effectively doubled, since it gets scored 1.0 (from the query) + 1.0 (from the boost) for each match. 现在,..告诉我们每个文档的分数实际上翻了一倍,因为它为每个匹配项的得分分别为1.0(来自查询)和1.0(来自boost)。

"explain": {
  "VA902B": "\n4.0 = sum of:\n  2.0 = sum of:\n    1.0 = sizes:[23 TO 23]\n    1.0 = sizes:[28 TO 28]\n  1.0 = sizes:[28 TO 28]\n  1.0 = sizes:[23 TO 23]\n",
  "3007WFP": "\n2.0 = sum of:\n  1.0 = sum of:\n    1.0 = sizes:[23 TO 23]\n  1.0 = sizes:[23 TO 23]\n"
},

I also tested the q=sizes(23 28) query with the standard lucene query parser (and not dismax/edismax which support bq ), and the behaviour was the same. 我还使用标准的lucene查询解析器(而不是支持bq dismax / edismax q=sizes(23 28)测试了q=sizes(23 28)查询,其行为是相同的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM