简体   繁体   中英

Has a single HyperLogLog the same accuracy than merging several ones?

If I create a HyperLogLog per day to count unique visitors, and then the 1st of January I merge the last 365 ones will I get the same value than if I keep a single HyperLogLog for the whole 365 days?

I guess not. But how different would those values be?

Short answer: Yes, both solutions have the same accuracy.

The basis of the HyperLogLog algorithm is the observation that the cardinality of a multiset of uniformly distributed random numbers can be estimated by calculating the maximum number of leading zeros in the binary representation of each number in the set . If the maximum number of leading zeros observed is n, an estimate for the number of distinct elements in the set is 2**n.

Referred from wiki

No matter it's a single hyperloglog or 365 hyperloglogs, the numbers of leading zeros are the same. So they have the same accuracy, and that's why hyperloglog can be merged.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM