简体   繁体   中英

Redis - Count distinct problem (without hyper log log)

I should solve a count-distinct problem in Redis without the use of HyperLogLog (because of the 0.81% of known error).

I got different requests with a list of objects [O1, O2, ... On] for a specific Key A. For each list of objects received, Redis should memorize the Objects not still saved and return the number of new objects saved.

For Example:

  • Request 1: Key: A - Objects: [O1, O2, O3] -> Response 1: Number of new objects: 3
  • Request 2: Key: A - Objects: [O1, O2, O4] -> Response 2: Number of new objects: 1
  • Request 3: Key: A - Objects: [O1, O2, O4] -> Response 3: Number of new objects: 0

I have tried to solve this problem with the Hyper Log Log and it's working perfectly but with a growing dataset of objects, the number of new objects saved is not so accurate. With the sets and the hashmap, the memory is growing too much.

I have read some stuff about Bitmaps but is not too clear. Do you have any reference to projects that are already facing this problem?

Thanks in advance

You might want to consider using a bloom filter. This is available as a module https://redis.com/redis-best-practices/bloom-filter-pattern/ .

Bloom filters allow quick tests for membership with 0 false negatives and a very low false negative, provided you know in advance what the maximum number of elements are. You would need to write code of the sort:

result = bf.exists(key, item)
if result == 0:
    bf.add(key, item)
    bf.inc(key_count)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM