簡體   English   中英

SQL查詢以查找至少一天的行數大於或等於閾值的用戶的最小值和最大值

[英]SQL query to find min and & max value for a user who has at least one day with a row count of > threshold

我有一個用戶群記錄,我試圖確定一種每天至少有100條記錄的用戶,然后通過查找用戶的最大和最小時間戳來確定該用戶的壽命。 我無法在單個查詢中做到這一點。 這是我確定滿足閾值的用戶的方式:

SELECT COUNT(*) count, userid, recorddate::date 
FROM data 
WHERE datatype = 0 
GROUP BY userid, recorddate::date 
HAVING COUNT(userid) > 100

但是,這僅返回計數> 100的日期的數據。我對至少一天計數> 100的用戶的最大和最小日期感興趣。是否可以通過上述方法修改此查詢以獲取我想要還是必須使用第二個查詢?

join結果為原始表讓那些誰擁有每天超過100個條目ATLEAST一旦用戶的壽命。

select d.user_id 
,max(d.record_date::date) - min(d.record_date::date) as user_lifespan_in_days
from data d
join (SELECT COUNT(*) count, userid, recorddate::date 
      FROM data 
      WHERE datatype = 0 
      GROUP BY userid, recorddate::date 
      HAVING COUNT(*) > 100) t
on t.user_id = d.user_id
group by d.user_id

請注意,這是兩個答案的比較。 雖然本文的第一部分是為sql-server編寫的,但我還嘗試了Postgres中的窗口化函數,下面的代碼也是如此。 最重要的是,這是一個兩步查詢,用於查詢所需結果。 第1步找到符合所需條件的UserId。第2步將其加入表中,並從整個數據集中獲取最大值和最小值。

我確實希望可以一​​步完成,但是結果很明顯,窗口函數與GROUP BY結合使用時,將基於GROUP BY的結果集而不是整個表來計算其結果。

這是一些測試數據,以便我們可以看到實際結果:

DECLARE @Data AS TABLE (UserId INT, RecordDate DATETIME)

INSERT INTO @Data (UserId, RecordDate)
VALUES (2,DATEADD(YEAR,-3,GETDATE())), (2,DATEADD(YEAR,3,GETDATE())), (4,DATEADD(YEAR,-6,GETDATE())), (4,DATEADD(YEAR,6,GETDATE()))

DECLARE @U INT = 1

WHILE @U < 5
BEGIN
    DECLARE @I INT = 1

    WHILE @I < 12
    BEGIN
       IF (@U IN (1,3) AND @I > 6)
       BEGIN
          BREAK
       END

       INSERT INTO @Data (UserId, RecordDate) VALUES (@U, DATEADD(MINUTE,-1,GETDATE()))

       SET @I += 1
    END

    SET @U += 1
END

這是@Gordon Linoff的建議

SELECT
    UserId, RecordDate, COUNT(*) AS [count]
    ,MIN(RecordDate) OVER (PARTITION BY UserId) AS min_recorddate
    ,MAX(RecordDate) OVER (PARTITION BY UserId) AS max_recorddate 
FROM
    @Data
GROUP BY
    UserId, RecordDate
HAVING
    COUNT(UserId) > 9

這是@vkp的建議

SELECT
    t.UserId
    ,COUNT(*) AS [count]
    ,MIN(d.RecordDate) as min_recorddate
    ,MAX(d.RecordDate) as max_recorddate
FROM
    @Data d
    INNER JOIN 
    (
       SELECT
          UserId
          ,RecordDate
          ,[count] = COUNT(*)
       FROM
          @Data
       GROUP BY
          UserId
          ,RecordDate
       HAVING
          COUNT(*) > 9
    ) t
    ON d.UserId = t.UserId
GROUP BY
    t.UserId

注意@戈登的結果:

在此處輸入圖片說明

@Vkp的結果:

在此處輸入圖片說明

我生成的測試日期的UserId 2的圖像

在此處輸入圖片說明

使用@Gordons建議添加Postgres測試用例:

CREATE TEMPORARY TABLE DATA (USERID INT, RECORDDATE TIMESTAMP)
ON COMMIT DELETE ROWS;

INSERT INTO DATA (USERID, RECORDDATE) VALUES (2,NOW() + INTERVAL '3 YEAR');
INSERT INTO DATA (USERID, RECORDDATE) VALUES (2,NOW() + INTERVAL '-3 YEAR');
INSERT INTO DATA (USERID, RECORDDATE) VALUES (4,NOW() + INTERVAL '6 YEAR');
INSERT INTO DATA (USERID, RECORDDATE) VALUES (4,NOW() + INTERVAL '-6 YEAR');

DO $$
    DECLARE
        i integer;
        u integer;
    BEGIN
        u := 1;
        WHILE (u < 5) LOOP
            i := 1;

            WHILE (i < 11) LOOP

                IF (u IN (1,3) AND i > 6) THEN
                    EXIT;
                END IF;

                INSERT INTO DATA (USERID, RECORDDATE) VALUES (u,NOW() + INTERVAL '-1 MINUTE');

            i = i + 1;

            END LOOP;

            u = u + 1;

        END LOOP;

    RAISE NOTICE 'value of i: %, and u: %', i, u;

END $$ ;


SELECT userid, recorddate::date, COUNT(*) as count,
       MIN(recorddate::date) OVER (PARTITION BY userid) as min_recorddate,
       MAX(recorddate::date) OVER (PARTITION BY userid) as max_recorddate
FROM data 
GROUP BY userid, recorddate::date 
HAVING COUNT(*) > 9;

結果

在此處輸入圖片說明

您的意思是,在給定的一天中,用戶至少有100條記錄。 這是一種方法:

SELECT userid, recorddate::date, COUNT(*) as count,
       MIN(recorddate::date) OVER (PARTITION BY userid) as min_recorddate,
       MAX(recorddate::date) OVER (PARTITION BY userid) as max_recorddate
FROM data 
WHERE datatype = 0 
GROUP BY userid, recorddate::date 
HAVING COUNT(*) > 100;

現在,如果一個用戶在多個日期都符合條件,它將為同一個用戶生成多個記錄。 一種解決方案是使用子查詢來過濾到用戶級別。 另一種是使用DISTINCT ON

SELECT DISTINCT ON (userid)
       userid, recorddate::date, COUNT(*) as count,
       MIN(recorddate::date) OVER (PARTITION BY userid) as min_recorddate,
       MAX(recorddate::date) OVER (PARTITION BY userid) as max_recorddate
FROM data 
WHERE datatype = 0 
GROUP BY userid, recorddate::date 
ORDER BY userid, COUNT(*) DESC
HAVING COUNT(userid) > 100;

現在我考慮一下。 我還沒有使用DISTINCT ON窗口函數。 所以我認為這會起作用。 子查詢或CTE絕對可以工作。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM