简体繁体 English

SOLR模式-文档ID字段的存储。 Lucene索引中的最大文档数

[英]SOLR schema - storing of document Id field. Maximum number of documents in Lucene index

原文 2012-03-14 09:03:51 5 1 solr/ lucene

I have a couple of questions about Lucene/SOLR index schema 我有一些关于Lucene / SOLR索引架构的问题

Here's my document Id field (UniqueKey) as defined in SOLR schema: 这是我在SOLR模式中定义的文档ID字段（UniqueKey）：
<field name="Id" type="long" indexed="true" stored="true" required="true" /> <field name =“ Id” type =“ long” indexed =“ true”存储=“ true”必需=“ true” />

I will never perform search by the Id field so does it need to be indexed="true"? 我将永远不会通过ID字段执行搜索，因此是否需要将其索引为“ true”？ And BTW, does it need to be stored="true" (I assume it will be stored anyway so it doesn't matter). 顺便说一句，是否需要将它存储为“ true”（我想还是会存储它，所以没关系）。

And 2: what is the maximum number of documents that can be stored in single SOLR index? 2：单个SOLR索引中最多可以存储多少个文档？ Or, to be more precise: can it hold 5 billion of small documents? 或者，更准确地说：它可以容纳50亿个小文件吗？

Third question: I need to perform search on a combination of 2 fields: one of type long and one integer. 第三个问题：我需要对2个字段进行组合搜索：其中一个是long类型，另一个是整数。 What is the most efficient way of storing and indexing such fields - store and index them separately or pre-compute some hash value based on both of them and search by the hash only? 存储和索引这些字段的最有效方法是-分别存储和索引它们，或者根据它们两者预先计算一些哈希值，然后仅通过哈希搜索？ Since I want to have few billions of such documents I need to minimize storage needs while keeping the search efficient. 由于我想拥有数十亿个这样的文档，因此我需要在保持搜索效率的同时最大程度地减少存储需求。

Thanks RG 谢谢RG

1 个解决方案

http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field

It is not mandatory for a schema to have a uniqueKey field 架构具有uniqueKey字段不是强制性的
Solr can hold a maximum of ~274 billion Documents. Solr最多可容纳约2,740亿个文档。 Handling and Search response will depend on the memory. 处理和搜索响应将取决于内存。 However, if your index size grows and is not maintainable, you can use Distributed Search . 但是，如果索引大小增加并且无法维护，则可以使用Distributed Search 。
You can combine the fields into a single field as hash and not mark it as stored to reduce the index size. 您可以将字段组合为单个字段作为哈希，而不是将其标记为已存储以减小索引大小。 This would speed up the initial searches. 这样可以加快初始搜索的速度。 Caching should take care of similar searches. 缓存应注意类似的搜索。