How to get ' COUNT DISTINCT' over moving window

Question

I'm working on a query to compute the distinct users of particular features of an app within a moving window. So, if there's a range from 15-20th October, I want a query to go from 8-15 Oct, 9-16 Oct etc and get the count of distinct users per feature. So for each date, it should have x rows where x is the number of features.

I have a query the following query so far:

WITH V1(edate, code, total) AS
   (
     SELECT date, featurecode, 
    DENSE_RANK() OVER ( PARTITION BY (featurecode ORDER BY accountid ASC) + DENSE_RANK() OVER ( PARTITION BY featurecode ORDER By accountid DESC) - 1 

FROM....
 GROUP BY edate, featurecode, appcode, accountid
 HAVING appcode='sample' AND eventdate BETWEEN '15-10-2018' And '20-10-2018'
) 

Select distinct date, code, total
from V1
WHERE date between '2018-10-15' AND '2018-10-20'

This returns the same set of values for all the dates. Is there any way to do this efficiently?? It's a DB2 database by the way but I'm looking for insight from postgresql users too.

Present result- All the totals are being repeated.

date        code                 total
10/15/2018   appname-feature1       123
10/15/2018   appname-feature2       234
10/15/2018   appname-feature3       321
10/16/2018   appname-feature1       123
10/16/2018   appname-feature2       234
10/16/2018   appname-feature3       321

Desired result.
date        code                 total
10/15/2018   appname-feature1       123
10/15/2018   appname-feature2       234
10/15/2018   appname-feature3       321
10/16/2018   appname-feature1       212
10/16/2018   appname-feature2       577
10/16/2018   appname-feature3       2345

Answer 1

This is not easy to do efficiently. DISTINCT counts are't incrementally maintainable (unless you go down the route of in-exact DISTINCT counts such as HyperLogLog).

It is easy to code in SQL, and try the usual indexing etc to help.

It is (possibly) not possible, however, to code with OLAP functions.. not least because you can only use RANGE BETWEEN for SUM() , COUNT() , MAX() etc, but not RANK() or DENSE_RANK() ... so just use a traditional co-related sub-select

First some data

CREATE TABLE T(D DATE,F CHAR(1),A CHAR(1));
INSERT INTO T (VALUES
    ('2018-10-10','X','A')
,   ('2018-10-11','X','A')
,   ('2018-10-15','X','A')
,   ('2018-10-15','X','A')
,   ('2018-10-15','X','B')
,   ('2018-10-15','Y','A')
,   ('2018-10-16','X','C')
,   ('2018-10-18','X','A')
,   ('2018-10-21','X','B')
) 
;

Now a simple select

WITH B AS (
    SELECT DISTINCT D, F FROM T
)
SELECT D,F
,    (SELECT COUNT(DISTINCT A)
      FROM T
      WHERE T.F = B.F 
      AND T.D BETWEEN B.D - 3 DAYS AND B.D + 4 DAYS
      ) AS DISTINCT_A_MOVING_WEEK
FROM
    B
ORDER BY F,D
;

giving, eg

 D          F DISTINCT_A_MOVING_WEEK
 ---------- - ----------------------
 2018-10-10 X                      1
 2018-10-11 X                      2
 2018-10-15 X                      3
 2018-10-16 X                      3
 2018-10-18 X                      3
 2018-10-21 X                      2
 2018-10-15 Y                      1

How to get ' COUNT DISTINCT' over moving window

Question

1 answers

solution1
0 2018-12-19 18:04:32

How to get ' COUNT DISTINCT' over moving window

Question

1 answers

solution1 0 2018-12-19 18:04:32

solution1
0 2018-12-19 18:04:32