简体   繁体   中英

Count unique users in last 60 mins per page with Redis HyperLogLog

I'm designing an algorithm to count unique users on a set of pages, based on a 60min sliding scale

So it needs to find unique IPs (or tokens) that have hit a particular page and total up those hits within the last 60 mins

I need this to be very fast at scale (mainly to write but reading is a bonus). We could have 10,000s of users per page multiplied by 1000s of pages.

My research is pointing me to using Redis with HyperLogLog

I'm new to Redis coming from a Memcache background. Could anyone give me any pointers?

Thanks

One way of doing this would be to keep an HLL key for each page/set of pages with a minute resolution. For example, if we're tracking 'index.html' and the current timestamp is 0, a visitor with the ID 'abc' can be tracked by:

PFADD index.html:0 abc

Once the minute had passed - ie timestamp 1 for simplicity - a visitor such as 'def' will be added to the next key:

PFADD index.html:1 def

And so forth. To count the number of unique visitors from the last 60 minutes, assuming the current timestamp 100, you'll need to call the PFCOUNT command and provide it with the names of all of these 60 keys, eg:

PFCOUNT index.html:100 index.html:99 ... index.html:41

Note: if you want "old" counts to be evicted, call EXPIRE after each call to PFADD .

You can't get time intervals in a single HyperLogLog key.

Sorted set could be an option;

  • You add your users to the sorted set as their entrance date as score and their user id as value with ZADD .
  • You can use ZCOUNT to get total number of unique users in that time interval. I used small numbers for timestamps for example.
127.0.0.1:6379> ZADD activeusers:page:1 1 a1
(integer) 1
127.0.0.1:6379> ZADD activeusers:page:1 1 a2
(integer) 1
127.0.0.1:6379> ZADD activeusers:page:1 3 a5
(integer) 1
127.0.0.1:6379> ZADD activeusers:page:1 116 a7
(integer) 1
127.0.0.1:6379> ZCOUNT activeusers:page:1 60 inf
(integer) 1
127.0.0.1:6379> ZRANGEBYSCORE activeusers:page:1 60 inf
1) "a7"

When you are using ZCOUNT , you will define MIN as (current time - (60*60)) and MAX as inf , so it will take between (now - 3600 seconds) and (now).

One of the drawbacks for this one is, you need to remove old data from these sets manually via using ZREMRANGEBYSCORE

127.0.0.1:6379> ZREMRANGEBYSCORE activeusers:page:1 -inf 59
(integer) 3
127.0.0.1:6379> ZRANGEBYSCORE activeusers:page:1 -inf inf
1) "a7"
127.0.0.1:6379> ZRANGEBYSCORE activeusers:page:1 -inf inf WITHSCORES
1) "a7"
2) "116"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM