简体   繁体   中英

Does elasticsearch/lucene impose memory overhead for missing values in fieldcache?

This question is for Elasticsearch primarily, but I believe the answer will be based on underlying Lucene semantics.

I'm contemplating using multiple types in the same index. A lot of fields will be sortable and a lot of fields will only be used by one particular type. Ie: fields will be sparse, say 10% coverage on average.

Since sorting keeps values for all docs in memory (regardess of type) , I'd like to know if there's any memory overhead with regards to missing fieldvalues (the ~90% in my case)

In a recent blog post on the official Elasticsearch blog titled "Index vs Type" , the author tackles a common problematic when it comes to choosing whether one wants to model his data using several indices or several types.

One fact is that Lucene indices don't like sparsity. As a result, the author says that

Fields that exist in one type will also consume resources for documents of types where this field does not exist. [...] And the issue is even worse with doc values: for speed reasons, doc values often reserve a fixed amount of disk space for every document, so that values can be addressed efficiently.

There is a Lucene issue that aims at improving this situation, which has been fixed in 5.4 and will be available in Elasticsearch v2.2. Even then, the author advises to still model your data in a way to limits sparsity as much as possible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM