简体   繁体   中英

How to count distinct users on a 28-day sliding window - SQL Hive

I'm trying to calculate how many different users use an app in a 28-day window time. For example, if I stand on feb-28, I want to know how many different users logged in the application. The trick here is that I want to count them only once. Hence if user "22" logged 28 times, I want them to count just as one.

Besides, a user can only appear once per date.

select b.date, count(DISTINCT a.id)
from table a,
 (SELECT distinct(date), date_sub(date,27) dt_start
  from table) b
where a.date >= b.dt_start and a.date <= b.fecha
group by b.date

But it's not working

Example of what I want, with a 2-day sliding window:

Input
Day  Id
1    A
1    B
2    C
2    A
3    B
3    D
4    D

Result:
Day   Count(distinct Id)
1     2
2     3
3     4
4     2

Thank you! :)

Consider a correlated subquery:

select distinct t.date, 
       (select count(distinct sub.id)
        from mytable sub
        where sub.date >= date_sub(t.date, 27) 
          and sub.date <= t.date) as distinct_users_27_days
from mytable t

Alternatively, an aggregation on self-join by window period:

select t1.date, 
       count(distinct t2.id) as distinct_users_27_days
from mytable t1
cross join mytable t2 
where t2.date >= date_sub(t1.date, 27) 
  and t2.date <= t1.date
group by t1.date

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM