简体   繁体   中英

Client-Side Pre-Computed Hashes for ElasticSearch Cardinality Aggregation

In the ElasticSearch documentation for the Cardinality Aggregation under the heading "Pre-computed hashes" I see the following:

On string fields that have a high cardinality, it might be faster to store the hash of your field values in your index and then run the cardinality aggregation on this field. This can either be done by providing hash values from client-side or by letting Elasticsearch compute hash values for you by using the mapper-murmur3 plugin.

Pre-computing hashes is usually only useful on very large and/or high-cardinality fields as it saves CPU and memory. However, on numeric fields, hashing is very fast and storing the original values requires as much or less memory than storing the hashes. This is also true on low-cardinality string fields, especially given that those have an optimization in order to make sure that hashes are computed at most once per unique value per segment.

I'm curious about the part where it says, "[this can be done] by providing hash values from client-side," because it doesn't elaborate at all on that point, but goes on to discuss numeric fields.

If I wanted to pre-compute hashes on the client, would using something like xxhash and putting the result in an appropriate number field be sufficient? (And, of course, having cardinality target that field.) Or would I need to use another type of field for the hash value?

Pre-computing hashes for high-cardinality string fields will speed up the cardinality aggregation, because hashes don't have to be computed in real-time. No need to do it on numeric fields, though!

For string fields, they advise to use the mapper-murmur3 plugin . Those hashes will be alphanumeric and should be stored in keyword fields (not a numeric field type,). that you then use in your cardinality aggregation.

I've personally seen 10x+ improvements when computing the cardinality of high-cardinality string fields with pre-computed hashes. Worth a try!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM