简体繁体中英

How much space and processing will be optimized in Lucene index by storing a field as Byte instead of String for billions of documents

原文 2018-04-11 00:54:32 1 1 algorithm/ lucene/ nlp/ information-retrieval

I understand the concept of inverted-index and how Dictionary storage optimization could help to load entire dictionary in main memory for the faster query.

I am trying to understand how Lucene index work.

Suppose I have a String type field which has only four distinct values for the 200 billion documents indexed in Lucene. This field is a Stored field.

If I change the field to Byte or Int type to represent all 4 distinct values and re-index and store all the 200 billion documents.

What would be storage and query optimization for this data type change? If there would be any.

Please suggest if I can do some test on my laptop to get a sense.

1 answers

As far as I know, a document in Lucene consists of a simple list of field-value pairs. A field must have at least one value, but any field can contain multiple values. Similarly, a single string value may be converted into multiple values by the analysis process.

Lucene doesn't care if the values are strings or numbers or dates. All values are just treated as opaque bytes.

For more information, please see this document .

How does lucene index documents?

How much stack space does this routine use?

How can I start a splitted string from index 1 instead of 0?

Find k closest from billions of coordinates in 3D space

space optimized solution for coin change

Swift 4 String index offset by too slow while processing a large string

How to indicate the index when storing hashing table?

Optimized counter for frequency of a character in a string

How does storing large numbers increase space complexity?

how can this algorithm be optimized

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How does lucene index documents? How much stack space does this routine use? How can I start a splitted string from index 1 instead of 0? Find k closest from billions of coordinates in 3D space space optimized solution for coin change Swift 4 String index offset by too slow while processing a large string How to indicate the index when storing hashing table? Optimized counter for frequency of a character in a string How does storing large numbers increase space complexity? how can this algorithm be optimized

Related Tags

How much space and processing will be optimized in Lucene index by storing a field as Byte instead of String for billions of documents

Question

1 answers

solution1 1 2018-04-25 08:02:16

solution1
1 2018-04-25 08:02:16