![](/img/trans.png)
[英]Find the students who attended at least one exam but not the Max nor the Min score
[英]SQL query to find min and & max value for a user who has at least one day with a row count of > threshold
我有一個用戶群記錄,我試圖確定一種每天至少有100條記錄的用戶,然后通過查找用戶的最大和最小時間戳來確定該用戶的壽命。 我無法在單個查詢中做到這一點。 這是我確定滿足閾值的用戶的方式:
SELECT COUNT(*) count, userid, recorddate::date
FROM data
WHERE datatype = 0
GROUP BY userid, recorddate::date
HAVING COUNT(userid) > 100
但是,這僅返回計數> 100的日期的數據。我對至少一天計數> 100的用戶的最大和最小日期感興趣。是否可以通過上述方法修改此查詢以獲取我想要還是必須使用第二個查詢?
join
結果為原始表讓那些誰擁有每天超過100個條目ATLEAST一旦用戶的壽命。
select d.user_id
,max(d.record_date::date) - min(d.record_date::date) as user_lifespan_in_days
from data d
join (SELECT COUNT(*) count, userid, recorddate::date
FROM data
WHERE datatype = 0
GROUP BY userid, recorddate::date
HAVING COUNT(*) > 100) t
on t.user_id = d.user_id
group by d.user_id
請注意,這是兩個答案的比較。 雖然本文的第一部分是為sql-server編寫的,但我還嘗試了Postgres中的窗口化函數,下面的代碼也是如此。 最重要的是,這是一個兩步查詢,用於查詢所需結果。 第1步找到符合所需條件的UserId。第2步將其加入表中,並從整個數據集中獲取最大值和最小值。
我確實希望可以一步完成,但是結果很明顯,窗口函數與GROUP BY結合使用時,將基於GROUP BY的結果集而不是整個表來計算其結果。
這是一些測試數據,以便我們可以看到實際結果:
DECLARE @Data AS TABLE (UserId INT, RecordDate DATETIME)
INSERT INTO @Data (UserId, RecordDate)
VALUES (2,DATEADD(YEAR,-3,GETDATE())), (2,DATEADD(YEAR,3,GETDATE())), (4,DATEADD(YEAR,-6,GETDATE())), (4,DATEADD(YEAR,6,GETDATE()))
DECLARE @U INT = 1
WHILE @U < 5
BEGIN
DECLARE @I INT = 1
WHILE @I < 12
BEGIN
IF (@U IN (1,3) AND @I > 6)
BEGIN
BREAK
END
INSERT INTO @Data (UserId, RecordDate) VALUES (@U, DATEADD(MINUTE,-1,GETDATE()))
SET @I += 1
END
SET @U += 1
END
這是@Gordon Linoff的建議
SELECT
UserId, RecordDate, COUNT(*) AS [count]
,MIN(RecordDate) OVER (PARTITION BY UserId) AS min_recorddate
,MAX(RecordDate) OVER (PARTITION BY UserId) AS max_recorddate
FROM
@Data
GROUP BY
UserId, RecordDate
HAVING
COUNT(UserId) > 9
這是@vkp的建議
SELECT
t.UserId
,COUNT(*) AS [count]
,MIN(d.RecordDate) as min_recorddate
,MAX(d.RecordDate) as max_recorddate
FROM
@Data d
INNER JOIN
(
SELECT
UserId
,RecordDate
,[count] = COUNT(*)
FROM
@Data
GROUP BY
UserId
,RecordDate
HAVING
COUNT(*) > 9
) t
ON d.UserId = t.UserId
GROUP BY
t.UserId
注意@戈登的結果:
@Vkp的結果:
我生成的測試日期的UserId 2的圖像
使用@Gordons建議添加Postgres測試用例:
CREATE TEMPORARY TABLE DATA (USERID INT, RECORDDATE TIMESTAMP)
ON COMMIT DELETE ROWS;
INSERT INTO DATA (USERID, RECORDDATE) VALUES (2,NOW() + INTERVAL '3 YEAR');
INSERT INTO DATA (USERID, RECORDDATE) VALUES (2,NOW() + INTERVAL '-3 YEAR');
INSERT INTO DATA (USERID, RECORDDATE) VALUES (4,NOW() + INTERVAL '6 YEAR');
INSERT INTO DATA (USERID, RECORDDATE) VALUES (4,NOW() + INTERVAL '-6 YEAR');
DO $$
DECLARE
i integer;
u integer;
BEGIN
u := 1;
WHILE (u < 5) LOOP
i := 1;
WHILE (i < 11) LOOP
IF (u IN (1,3) AND i > 6) THEN
EXIT;
END IF;
INSERT INTO DATA (USERID, RECORDDATE) VALUES (u,NOW() + INTERVAL '-1 MINUTE');
i = i + 1;
END LOOP;
u = u + 1;
END LOOP;
RAISE NOTICE 'value of i: %, and u: %', i, u;
END $$ ;
SELECT userid, recorddate::date, COUNT(*) as count,
MIN(recorddate::date) OVER (PARTITION BY userid) as min_recorddate,
MAX(recorddate::date) OVER (PARTITION BY userid) as max_recorddate
FROM data
GROUP BY userid, recorddate::date
HAVING COUNT(*) > 9;
結果
您的意思是,在給定的一天中,用戶至少有100條記錄。 這是一種方法:
SELECT userid, recorddate::date, COUNT(*) as count,
MIN(recorddate::date) OVER (PARTITION BY userid) as min_recorddate,
MAX(recorddate::date) OVER (PARTITION BY userid) as max_recorddate
FROM data
WHERE datatype = 0
GROUP BY userid, recorddate::date
HAVING COUNT(*) > 100;
現在,如果一個用戶在多個日期都符合條件,它將為同一個用戶生成多個記錄。 一種解決方案是使用子查詢來過濾到用戶級別。 另一種是使用DISTINCT ON
:
SELECT DISTINCT ON (userid)
userid, recorddate::date, COUNT(*) as count,
MIN(recorddate::date) OVER (PARTITION BY userid) as min_recorddate,
MAX(recorddate::date) OVER (PARTITION BY userid) as max_recorddate
FROM data
WHERE datatype = 0
GROUP BY userid, recorddate::date
ORDER BY userid, COUNT(*) DESC
HAVING COUNT(userid) > 100;
現在我考慮一下。 。 。 我還沒有使用DISTINCT ON
窗口函數。 所以我認為這會起作用。 子查詢或CTE絕對可以工作。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.