[英]SQL - count unique first occurrence of value
我有一個包含用戶活動的日志表。 我正在嘗試創建一個查詢,以顯示唯一用戶條目和新用戶條目。
樣本數據:
| uid | act | tm |
| --- | --- | ------------------------ |
| 1 | l | 2019-01-02T00:00:00.000Z |
| 1 | l | 2019-01-05T00:00:00.000Z |
| 2 | l | 2019-02-02T00:00:00.000Z |
| 1 | l | 2019-02-03T00:00:00.000Z |
| 2 | l | 2019-02-04T00:00:00.000Z |
| 3 | l | 2019-02-05T00:00:00.000Z |
| 1 | l | 2019-03-02T00:00:00.000Z |
| 2 | l | 2019-03-02T00:00:00.000Z |
| 3 | l | 2019-03-02T00:00:00.000Z |
| 4 | l | 2019-03-02T00:00:00.000Z |
第一部分很簡單: count(distinct(uid)) as tot_users
但是有沒有辦法做第二部分-計算在那個時期但之前沒有出現過的用戶...
這是到目前為止我得到的-https: //www.db-fiddle.com/f/8EXsih1VAL1iWXKeauPQiB/1
為了將來參考,我用2個建議的解決方案更新了db-fiddle。 兩者都很好地工作:
https://www.db-fiddle.com/f/8EXsih1VAL1iWXKeauPQiB/6
SELECT
to_char( date_trunc('month', tm), 'YYYY-MM') as mnth,
count(uid) as tot_entries,
COUNT(DISTINCT uid) as tot_users,
COUNT(DISTINCT
CASE
WHEN DATE_TRUNC('month', min_tm) = DATE_TRUNC('month', tm)
THEN uid
END) AS new_users
FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x
GROUP BY mnth
ORDER BY mnth;
SELECT
to_char(date_trunc('month', l1.tm), 'YYYY-MM') mnth,
count(l1.uid) tot_entries,
count(DISTINCT l1.uid) tot_users,
count(DISTINCT
CASE
WHEN NOT EXISTS (SELECT *
FROM logs l2
WHERE l2.uid = l1.uid
AND to_char(date_trunc('month', l2.tm), 'YYYY-MM') < to_char(date_trunc('month', l1.tm), 'YYYY-MM'))
THEN
l1.uid
END) new_users
FROM logs l1
GROUP BY mnth
ORDER BY mnth;
您可以使用條件聚合。 在CASE
表達式中,檢查上個月同一用戶的日志條目是否存在。 除非找到這樣的條目,否則返回用戶的ID。 使用該表達式作為count()
的參數。
SELECT to_char(date_trunc('month', l1.tm), 'YYYY-MM') mnth,
count(l1.uid) tot_entries,
count(DISTINCT l1.uid) tot_users,
count(DISTINCT CASE
WHEN NOT EXISTS (SELECT *
FROM logs l2
WHERE l2.uid = l1.uid
AND to_char(date_trunc('month', l2.tm), 'YYYY-MM') < to_char(date_trunc('month', l1.tm), 'YYYY-MM')) THEN
l1.uid
END) new_users
FROM logs l1
GROUP BY mnth
ORDER BY mnth;
您可以在子查詢中使用窗口函數來計算每個用戶的第一個日志條目的時間戳,例如:
SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l
然后,您可以在外部查詢中分析結果。 當用戶的第一個日志條目的日期屬於分析間隔時,您可以將其計為新用戶 。
假設參數:start_tm
和:end_tm
代表分析周期的開始和結束,您將執行以下操作:
SELECT
COUNT(DISTINCT uid) as tot_users,
COUNT(DISTINCT CASE WHEN min_tm >= :start_tm AND min_tm < :end_tm THEN uid END) AS tot_new_users
FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x
WHERE tm >= :start_tm AND tm < :end_tm
如果您需要按月匯總:
SELECT
DATE_TRUNC('month', tm) AS my_month,
COUNT(DISTINCT uid) as tot_users,
COUNT(DISTINCT CASE WHEN DATE_TRUNC('month', min_tm) = DATE_TRUNC('month', tm) THEN uid END) AS tot_new_users
FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x
GROUP BY my_month
ORDER BY my_month
您可以使用Have子句或自連接。 您提到了一個句點,所以我不確定確切的過濾條件,但假設這是一種簡單的情況,您可以執行以下操作
select
uid,
case when mintm<'2019-03-02T00:00:00.000Z' --cutoff
then 'old' else 'new'
end flag
from (
select uid, min(tm) mintm from table
group by uid ) as first_logins
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.