简体繁体 English

如何过滤Solr中多值字段上返回的值

[英]How to filter values returned on a multivalued field in Solr

原文 2015-04-09 09:50:43 0 3 solr/ solr4

I have a document with a field called uuids. 我有一个名为uuids的文档。 This field is a list (multivalued) can have up to 100k values per document. 此字段是一个列表（多值），每个文档最多可以有100k值。

I want to search for documents that match uuids that start with "5ff6115e" for instance. 我想搜索匹配以“5ff6115e”开头的uuids的文档。 I can already do it successfully by using q=uuids:5ff6115e* : 我已经可以通过使用q=uuids:5ff6115e*成功地完成它：

http://localhost:8983/solr/test1/select?q=uuids%3A5ff6115e *&rows=1&fl=uuids&wt=json&indent=true http：// localhost：8983 / solr / test1 / select？q = uuids％3A5ff6115e *＆rows = 1＆fl = uuids＆wt = json＆indent = true

However, the resultant document brings me all 100k values for this field. 但是，结果文档为我提供了该字段的所有100k值。

What I want is not only filter the documents whose uuids field start with this value, but also filter the field values returned so that I will only receive specific values in the answer. 我想要的不仅是过滤uuids字段以此值开头的文档，还要过滤返回的字段值，这样我才会在答案中收到特定的值。

How to do that? 怎么做？

3 个解决方案

Use highlighting. 使用突出显示。 @Jokin first mentioned it and I feel this is the best answer without hacking on Solr. @Jokin首先提到它，我觉得这是最好的答案，不会攻击Solr。 Try either the PostingsHighlighter or the FastVectorHighlighter, not the default/standard highlighter. 尝试使用PostingsHighlighter或FastVectorHighlighter，而不是默认/标准荧光笔。 Unfortunately both of them internally execute a wildcard query against all UIDS in this field. 不幸的是，它们都在内部对该字段中的所有UIDS执行通配符查询。 FVH has the opportunity internally to be smarter about that but it's not implemented that way. FVH有机会在内部更聪明地做到这一点，但它并没有这样实现。

note: if it's within scope to write a little Java to add to Solr, the ideal answer would be to add term vectors (just the terms data in the term-vector, no offsets/positions) and then write a "DocTransformer" to grab the term vector terms; 注意：如果写入一个小的Java添加到Solr的范围内，理想的答案是添加术语向量（只是术语向量中的术语数据，没有偏移/位置），然后写一个“DocTransformer”来抓取术语矢量术语; seek to the prefix, then iterate on those that have that prefix. 寻找前缀，然后迭代那些有前缀的人。 Pretty darned fast. 相当快速的darned。

This is not currently possible; 这目前不可能; see this bug and this previous question . 看到这个bug和前一个问题。

I don't know how big it's your index, but having a document with 100k multivalued fields doesn't seem the right approach to me. 我不知道你的索引有多大，但是拥有一个包含10万个多值字段的文档对我来说似乎不是正确的方法。 In this cases instead of asking for a feature in solr, it's better to refactor your index and store the information in other way, maybe creating another core with documents that have each the uniqueid of your document and a field with the guid. 在这种情况下，不要在solr中要求使用某个功能，最好重构索引并以其他方式存储信息，也许创建另一个核心，其中包含文档的唯一文档和带有guid的字段。 You can use then field collapsing or other solr features to get the info that you need. 您可以使用then field collapsing或其他solr功能来获取所需的信息。

So, for example, a simple case in solr was to index books, and instead of indexing each book as a whole, it was better to index each separate page as a document. 因此，例如，solr中的一个简单案例是索引书籍，而不是将每本书作为一个整体索引，最好将每个单独的页面索引为文档。 If you could tell us a bit more about your case we can think how the index can be improved. 如果你能告诉我们更多关于你的案例，我们可以考虑如何改进索引。

Anyway, for cases that doesn't have so many values you can achive the same result with the highlighting component. 无论如何，对于没有这么多值的情况，您可以使用突出显示组件获得相同的结果。 for best performance you can exclude the field in the return field list, and use the highlighter to return the matched terms. 为获得最佳性能，您可以在返回字段列表中排除该字段，并使用突出显示器返回匹配的字词。 You can tune the highlighter to get the maximum number of snippets and how big is each one etc. http://localhost:8893/solr/test1/select?q=uuids%3A5ff6115e*&rows=1&fl=id&wt=json&indent=true&hl=on&hl.fragsize=1&hl.fl=uuids 您可以调整荧光笔以获得最大数量的片段以及每个片段的大小等等.http：// localhost：8893 / solr / test1 / select？q = uuids％3A5ff6115e *＆rows = 1＆fl = id＆wt = json＆indent = true＆hl =上＆hl.fragsize = 1＆hl.fl = UUID的