简体   繁体   中英

If HyperLogLog in Redis does not store the actual members but only count, how does PFMERGE work?

Does HyperLogLog store the actual members or only the count of members it is storing?

If it is not storing the actual members, how does PFMERGE know which element to merge as count of 1 even when they are repeated across multiple HyperLogLog

PFADD mobileusers user1 user2 user3
PFADD websiteusers user2 user3 user4
PFMERGE totalusers mobileusers websiteusers

PFCOUNT totalusers
4

How does merge command know that users2 and user3 is repeated in both the HyperLogLog?

This sort of involves going deep into the weeds of how the hyperloglog data structure works.

Basically, you initialize the hyperloglog with 2^p registers of ~1 byte (p is a constant, typically between 16 and 18 - in Redis I'm pretty sure it's 18. When you get a value for a set that you want to insert into the hyperloglog, you hash the value, that hashed value, you check the first p bits (most significant -> least significant), that value is the register number you want to set, then you set that register to the maximum of either the register's current value, or the the position of the right-most 1.

Because of that last action (setting the maximal value of the register) it's actually relatively easy to go back through both hyperloglog's being merged and combine them, simply by setting each register to the maximum value between the two.

If you'd like to learn exactly how the hll algoritm works you can look at the paper by Flajolte et all when the hll was first introduced.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM