简体   繁体   English

关于 solr 中的存储字段与 docvalues

[英]About stored field vs docvalues in solr

Please help understand the following regarding solr请帮助理解以下关于 solr

1)Where are stored fields and docValues fields saved in solr? 1)solr中存储的字段和docValues字段保存在哪里? 2)if we are enabling docvalues for some fields, will the normal query (only search, with no faceting or sort applied) performance be better when compared to using stored fields? 2)如果我们为某些字段启用 docvalues,与使用存储字段相比,普通查询(仅搜索,不应用分面或排序)性能会更好吗? 3)Is it advisable to replace all the stored fields with docValues? 3)是否建议用 docValues 替换所有存储的字段?

DocValues are a way of recording field values internally that is more efficient for some purposes, such as sorting and faceting, than traditional indexing. DocValues是一种在内部记录字段值的方法,对于某些目的(例如排序和分面)比传统索引更有效。

DocValue fields are now column-oriented fields with a document-to-value mapping built at index time. DocValue字段现在是面向列的字段,具有在索引时构建的文档到值的映射。 This approach promises to relieve some of the memory requirements of the fieldCache and make lookups for faceting, sorting, and grouping much faster.这种方法有望减轻 fieldCache 的一些内存需求,并使查找分面、排序和分组的速度更快。

Stored fields store all field values for one document together in a row-stride fashion. Stored字段以行跨度的方式将一个文档的所有字段值存储在一起。 while retrieval of document, all field values are returned at once per document, so that loading the relevant information about a document is very fast.在检索文档时,每个文档一次返回所有字段值,因此加载有关文档的相关信息非常快。

However, if you need to scan a field (for faceting/sorting/grouping/highlighting) it will be a slow process, as you will have to iterate through all the documents and load each document's fields per iteration resulting in disk seeks.但是,如果您需要扫描一个字段(用于分面/排序/分组/突出显示),这将是一个缓慢的过程,因为您必须遍历所有文档并在每次迭代时加载每个文档的字段,从而导致磁盘查找。

Field values retrieved during search queries are typically returned from stored values.在搜索查询期间检索到的字段值通常是从存储的值中返回的。 However, non-stored docValues fields will be also returned along with other stored fields when all fields (or pattern matching globs) are specified to be returned (eg “fl=*”) for search queries depending on the effective value of the useDocValuesAsStored parameter for each field.但是,当根据useDocValuesAsStored参数的有效值为搜索查询指定返回所有字段(或模式匹配全局)(例如“fl=*”)时,非存储 docValues 字段也将与其他存储字段一起返回对于每个字段。 For schema versions >= 1.6, the implicit default is useDocValuesAsStored="true"对于架构版本 >= 1.6,隐式默认值为useDocValuesAsStored="true"

When retrieving fields from their docValues form (using the /export handler, streaming expressions or if the field is requested in the fl parameter), two important differences between regular stored fields and docValues fields must be understood:从其 docValues 表单中检索字段时(使用 /export 处理程序、流式表达式或如果在 fl 参数中请求该字段),必须了解常规存储字段和 docValues 字段之间的两个重要区别:

  1. Order is not preserved.订单不被保留。 For simply retrieving stored fields, the insertion order is the return order.对于简单地检索存储的字段,插入顺序就是返回顺序。 For docValues, it is the sorted order.对于 docValues,它是排序顺序。

  2. Multiple identical entries are collapsed into a single value.多个相同的条目被折叠成一个值。 Thus if I insert values 4, 5, 2, 4, 1, my return will be 1, 2, 4, 5.因此,如果我插入值 4、5、2、4、1,我的回报将是 1、2、4、5。

In cases where the query is returning only docValues fields performance may improve since returning stored fields requires disk reads and decompression whereas returning docValues fields in the fl list only requires memory access.在查询只返回 docValues 字段的情况下,性能可能会提高,因为返回存储的字段需要磁盘读取和解压缩,而返回 fl 列表中的 docValues 字段只需要内存访问。

In a environment with low-memory , or you don't need to index a field, DocValues are perfect for faceting/grouping/filtering/sorting/function queries.在内存不足的环境中,或者您不需要索引字段, DocValues非常适合分面/分组/过滤/排序/函数查询。

For more details please refer DocValues有关更多详细信息,请参阅DocValues

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM