I'm working on a query to compute the distinct users of particular features of an app within a moving window. So, if there's a range from 15-20th October, I want a query to go from 8-15 Oct, 9-16 Oct etc and get the count of distinct users per feature. So for each date, it should have x rows where x is the number of features.
I have a query the following query so far:
WITH V1(edate, code, total) AS
(
SELECT date, featurecode,
DENSE_RANK() OVER ( PARTITION BY (featurecode ORDER BY accountid ASC) + DENSE_RANK() OVER ( PARTITION BY featurecode ORDER By accountid DESC) - 1
FROM....
GROUP BY edate, featurecode, appcode, accountid
HAVING appcode='sample' AND eventdate BETWEEN '15-10-2018' And '20-10-2018'
)
Select distinct date, code, total
from V1
WHERE date between '2018-10-15' AND '2018-10-20'
This returns the same set of values for all the dates. Is there any way to do this efficiently?? It's a DB2 database by the way but I'm looking for insight from postgresql users too.
Present result- All the totals are being repeated.
date code total
10/15/2018 appname-feature1 123
10/15/2018 appname-feature2 234
10/15/2018 appname-feature3 321
10/16/2018 appname-feature1 123
10/16/2018 appname-feature2 234
10/16/2018 appname-feature3 321
Desired result.
date code total
10/15/2018 appname-feature1 123
10/15/2018 appname-feature2 234
10/15/2018 appname-feature3 321
10/16/2018 appname-feature1 212
10/16/2018 appname-feature2 577
10/16/2018 appname-feature3 2345
This is not easy to do efficiently. DISTINCT counts are't incrementally maintainable (unless you go down the route of in-exact DISTINCT counts such as HyperLogLog).
It is easy to code in SQL, and try the usual indexing etc to help.
It is (possibly) not possible, however, to code with OLAP functions.. not least because you can only use RANGE BETWEEN for SUM()
, COUNT()
, MAX()
etc, but not RANK() or DENSE_RANK()
... so just use a traditional co-related sub-select
First some data
CREATE TABLE T(D DATE,F CHAR(1),A CHAR(1));
INSERT INTO T (VALUES
('2018-10-10','X','A')
, ('2018-10-11','X','A')
, ('2018-10-15','X','A')
, ('2018-10-15','X','A')
, ('2018-10-15','X','B')
, ('2018-10-15','Y','A')
, ('2018-10-16','X','C')
, ('2018-10-18','X','A')
, ('2018-10-21','X','B')
)
;
Now a simple select
WITH B AS (
SELECT DISTINCT D, F FROM T
)
SELECT D,F
, (SELECT COUNT(DISTINCT A)
FROM T
WHERE T.F = B.F
AND T.D BETWEEN B.D - 3 DAYS AND B.D + 4 DAYS
) AS DISTINCT_A_MOVING_WEEK
FROM
B
ORDER BY F,D
;
giving, eg
D F DISTINCT_A_MOVING_WEEK
---------- - ----------------------
2018-10-10 X 1
2018-10-11 X 2
2018-10-15 X 3
2018-10-16 X 3
2018-10-18 X 3
2018-10-21 X 2
2018-10-15 Y 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.