简体   繁体   English

在Solr中以多值字段的升序对搜索结果进行排序

[英]Sort the search result in ascending order of a multivalued field in Solr

I'm using Solr of version 6.6.0 . 我正在使用version 6.6.0 Solr。 I have a schema of title (text_general), description(text_general), id(integer). 我有一个标题(text_general),描述(text_general),id(integer)的架构。 When I search for a keyword to list the results in ascending order of the title my code returns an error can not sort on multivalued field: title. 当我搜索关键字以标题的升序列出结果时,我的代码返回无法对多值字段:title进行排序的错误

I have tried to set the sort using the following 3 methods 我尝试使用以下3种方法设置排序

SolrQuery query = new SolrQuery();
1. query.setSort("title", SolrQuery.ORDER order);
2. query.addSort("title", SolrQuery.ORDER order);
3. SortClause ab = new SolrQuery.SortClause("title", SolrQuery.ORDER.asc);
   query.addSort(ab);

but all of these returns the same error 但是所有这些都返回相同的错误

I found a solution by referring to this answer 我通过参考此答案找到了解决方案

It says to use min/max functions. 它说使用最小/最大功能。 query.setSort(field("pageTitle",min), ORDER.asc); query.setSort(field(“ pageTitle”,min),ORDER.asc); this what I'm trying to set as the query, I didn't understand what are the arguments used here. 这就是我要设置为查询的内容,我不明白这里使用的参数是什么。

This is the maven dependency that I'm using 这是我正在使用的Maven依赖项

<dependency>
    <groupId>org.apache.solr</groupId>
    <artifactId>solr-solrj</artifactId>
    <version>6.5.1</version>
</dependency>

Unless title actually is multiValued - can your post have multiple titles - you should define it as multiValued="false" in your schema . 除非title实际上是multiValued的(您的帖子可以有多个标题),否则应在架构multiValued="false"其定义为multiValued="false" However, there's a second issue - a field of the default type text_general isn't suited for sorting, as it'll generate multiple tokens, one for each word in the title. 但是,还有第二个问题-默认类型text_general的字段不适合排序,因为它将生成多个令牌,标题中的每个单词一个。 This is useful for searching, but will give weird and non-intuitive results when sorting. 这对于搜索很有用,但在排序时会给出奇怪且不直观的结果。

So instead, define a title_sort field and use a field type with a KeywordTokenizer and LowerCaseFilter attached (if you want case insensitive sort), or if you want case sensitive sort, use the already defined string field type for the title_sort field. 因此,请定义title_sort字段,并使用 附加 KeywordTokenizerLowerCaseFilter 的字段类型 (如果要区分大小写的排序),或者如果要区分大小写的排序,请为title_sort字段使用已定义的string字段类型。

The first thing to check is do you really need that title field to be multivalued, or do your documents really have multiple titles ? 首先要检查的是,您是否真的需要对标题字段进行多值处理,或者您的文档中确实有多个标题? If not, you just need to fix the field definition by setting multivalued="false" . 如果不是,您只需要通过设置multivalued="false"来修复字段定义。

That said, sorting on a multivalued field doesn't make sense unless determining which one of these multiple values should be used to sort on, or how to combine them into one. 也就是说,对多值字段进行排序是没有意义的,除非确定应使用这些多个值中的哪个进行排序,或者如何将它们组合为一个。

Let' say we need to sort a given resultset by title (alphabetically), first using a single-valued title field : 假设我们需要按标题(字母顺序)对给定的结果集进行排序,首先使用单值标题字段:

# Unsorted
"docs": [
  { "id": "1", "title": "One" },
  { "id": "2", "title": "Two" },
  { "id": "3", "title": "Three" },
]

# Sorted
"docs": [
  { "id": "1", "title": "One" },
  { "id": "3", "title": "Three" },
  { "id": "2", "title": "Two" },
]

# -> ok no problem here

Now applying the same logic with a multi-valued field is not possible as is, you would necessarily need to determine which title to use in each document to properly sort them : 现在不可能对多值字段应用相同的逻辑,因此,您必须确定要在每个文档中使用哪个标题才能对它们进行正确排序:

# Unorted
"docs": [
  { "id": "1", "title": ["One", "z-One", "a-One"] },
  { "id": "2", "title": ["Two", "z-Two", "a-Two"] },
  { "id": "3", "title": ["Three", "z-Three", "a-Three"] }
]

Hopefully, Solr allows to sort results by the output of a function, meaning you can use any from Solr's function queries to "get" a single value per title field. 希望Solr允许按函数的输出对结果进行排序,这意味着您可以使用Solr的函数查询中的任何一个来“获取”每个标题字段的单个值。 The answer you referred to is a good example even though it may not work for you (because title would need docValues enabled - depends on field definition - and knowing that max/min functions should be used only with numeric values), just to get the idea : 即使您可能无法使用,您指的答案也是一个很好的例子(因为标题需要启用docValues-取决于字段定义-并且知道max / min函数应仅用于数字值),只是为了获得想法:

# here the 2nd argument is a callback to max(), used precisely to get a single value from title
sort=field(title,max) asc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM