I have records for a user base, and I am trying to identify a kind of user who has at least 100 records per day and then determine that user's life span by finding the user's max and min time stamp. I have not been able to do that in a single query. Here's how I identify users who meet the threshold:
SELECT COUNT(*) count, userid, recorddate::date
FROM data
WHERE datatype = 0
GROUP BY userid, recorddate::date
HAVING COUNT(userid) > 100
However, this only returns data for days where the count was > 100. I am interested in the max and min date for a user who had at least one day with the count > 100. Is there a way to modify this query above to get what I want or must I use a second query?
join
the result to the original table to get the lifespan of those users who have more than 100 entries per day atleast once.
select d.user_id
,max(d.record_date::date) - min(d.record_date::date) as user_lifespan_in_days
from data d
join (SELECT COUNT(*) count, userid, recorddate::date
FROM data
WHERE datatype = 0
GROUP BY userid, recorddate::date
HAVING COUNT(*) > 100) t
on t.user_id = d.user_id
group by d.user_id
Note this is a comparison of 2 of the answers. While the first section of this is written for sql-server, I also tried the windowed functions specifically in Postgres the code is below as well. The bottom line is that this is a 2 step query for the questions desired results. Step 1 find the UserIds meeting the criteria you want Step 2 joining that back to the table and getting the max and min from the entire dataset.
I truly wish it could be done in one step but the results are clear that the windowed functions when combined with GROUP BY will calculate their results based upon the result set of the GROUP BY NOT the entire table.
Here is some test Data so that we can see the actual results:
DECLARE @Data AS TABLE (UserId INT, RecordDate DATETIME)
INSERT INTO @Data (UserId, RecordDate)
VALUES (2,DATEADD(YEAR,-3,GETDATE())), (2,DATEADD(YEAR,3,GETDATE())), (4,DATEADD(YEAR,-6,GETDATE())), (4,DATEADD(YEAR,6,GETDATE()))
DECLARE @U INT = 1
WHILE @U < 5
BEGIN
DECLARE @I INT = 1
WHILE @I < 12
BEGIN
IF (@U IN (1,3) AND @I > 6)
BEGIN
BREAK
END
INSERT INTO @Data (UserId, RecordDate) VALUES (@U, DATEADD(MINUTE,-1,GETDATE()))
SET @I += 1
END
SET @U += 1
END
Here is @Gordon Linoff's suggestion
SELECT
UserId, RecordDate, COUNT(*) AS [count]
,MIN(RecordDate) OVER (PARTITION BY UserId) AS min_recorddate
,MAX(RecordDate) OVER (PARTITION BY UserId) AS max_recorddate
FROM
@Data
GROUP BY
UserId, RecordDate
HAVING
COUNT(UserId) > 9
And here is @vkp's suggestion
SELECT
t.UserId
,COUNT(*) AS [count]
,MIN(d.RecordDate) as min_recorddate
,MAX(d.RecordDate) as max_recorddate
FROM
@Data d
INNER JOIN
(
SELECT
UserId
,RecordDate
,[count] = COUNT(*)
FROM
@Data
GROUP BY
UserId
,RecordDate
HAVING
COUNT(*) > 9
) t
ON d.UserId = t.UserId
GROUP BY
t.UserId
Note @Gordon's results:
@Vkp's resuls:
Image of UserId 2 from Test Date I generated
Adding Postgres Test Case with @Gordons suggestion:
CREATE TEMPORARY TABLE DATA (USERID INT, RECORDDATE TIMESTAMP)
ON COMMIT DELETE ROWS;
INSERT INTO DATA (USERID, RECORDDATE) VALUES (2,NOW() + INTERVAL '3 YEAR');
INSERT INTO DATA (USERID, RECORDDATE) VALUES (2,NOW() + INTERVAL '-3 YEAR');
INSERT INTO DATA (USERID, RECORDDATE) VALUES (4,NOW() + INTERVAL '6 YEAR');
INSERT INTO DATA (USERID, RECORDDATE) VALUES (4,NOW() + INTERVAL '-6 YEAR');
DO $$
DECLARE
i integer;
u integer;
BEGIN
u := 1;
WHILE (u < 5) LOOP
i := 1;
WHILE (i < 11) LOOP
IF (u IN (1,3) AND i > 6) THEN
EXIT;
END IF;
INSERT INTO DATA (USERID, RECORDDATE) VALUES (u,NOW() + INTERVAL '-1 MINUTE');
i = i + 1;
END LOOP;
u = u + 1;
END LOOP;
RAISE NOTICE 'value of i: %, and u: %', i, u;
END $$ ;
SELECT userid, recorddate::date, COUNT(*) as count,
MIN(recorddate::date) OVER (PARTITION BY userid) as min_recorddate,
MAX(recorddate::date) OVER (PARTITION BY userid) as max_recorddate
FROM data
GROUP BY userid, recorddate::date
HAVING COUNT(*) > 9;
Results
You mean that on a given day, the user has at least 100 records. Here is one method:
SELECT userid, recorddate::date, COUNT(*) as count,
MIN(recorddate::date) OVER (PARTITION BY userid) as min_recorddate,
MAX(recorddate::date) OVER (PARTITION BY userid) as max_recorddate
FROM data
WHERE datatype = 0
GROUP BY userid, recorddate::date
HAVING COUNT(*) > 100;
Now, this will produce multiple records for the same user, if a user meets the criteria on multiple dates. One solution is to use a subquery to filter down to the user level. Another is to use DISTINCT ON
:
SELECT DISTINCT ON (userid)
userid, recorddate::date, COUNT(*) as count,
MIN(recorddate::date) OVER (PARTITION BY userid) as min_recorddate,
MAX(recorddate::date) OVER (PARTITION BY userid) as max_recorddate
FROM data
WHERE datatype = 0
GROUP BY userid, recorddate::date
ORDER BY userid, COUNT(*) DESC
HAVING COUNT(userid) > 100;
Now that I think about it . . . I haven't used window functions with DISTINCT ON
. So I think this will work. A subquery or CTE definitely would work.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.