如何獲取累計用戶總數但忽略前一天已經出現的用戶？使用大查詢

Question

所以我想計算每天的累積用戶，但如果用戶存在是前幾天，他們將不計算在內。

date_key      user_id
2022-01-01     001
2022-01-01     002
2022-01-02     001
2022-01-02     003
2022-01-03     002
2022-01-03     003
2022-01-04     002
2022-01-04     004

每天我們可以得到

date_key     total_user
2022-01-01      2
2022-01-02      2
2022-01-03      2
2022-01-04      2

如果我們簡單地計算累積，我們每天可以得到 2,4,6,8 目標是得到這樣的表格

date_key     total_user
2022-01-01      2
2022-01-02      3
2022-01-03      3
2022-01-04      4

我使用這個查詢來獲得結果，因為數據真的很大。 查詢需要永遠完成。

select b.date_key,count(distinct a.user_id) total_user
from t1 a
join t1 b 
   on b.date_key >= a.date_key 
   and date_trunc(a.date_key,month) = date_trunc(b.date_key,month)
group by 1
order by 1

是的，當月份變化時，計算應該重置。

順便說一句，我正在使用谷歌 bigquery

Answer 1

按日期順序對每個用戶的外觀進行編號。 只計算第一次看到的那些：

with data as (
    select *,
        row_number() over (partition by date_trunc(date_key, month), userid
                           order by date_key) as rn
    from T
)
select date_key,
    sum(count(case when rn = 1 then 1 end)) -- or countif(rn = 1)
        over (partition by date_trunc(date_key, month)
              order by date_key) as cum_monthly_users
from data
group by date_key;

https://dbfiddle.uk/?rdbms=postgres_14&fiddle=dc426d79a7786fc8a5b25a22f0755e27

Answer 2

累計用戶總數，但忽略前一天已經出現的用戶？

當月份變化時，計算應該重置

數據真的很大

考慮以下方法

select date_key, 
  ( select hll_count.merge(u) 
    from unnest(users) u
  ) as total_user
from (
  select date_key, date_trunc(date(date_key), month) year_month,
    array_agg(users) over(partition by date_trunc(date(date_key), month) order by date_key) users
  from (
    select date_key, hll_count.init(user_id) users
    from your_table
    group by date_key
  )
)

如果應用於您問題中的示例數據 - output 是

注意：沒有[明顯]上面的##1和2被滿足-和output如預期的那樣，但在這里我們使用HyperLogLog++函數將有效地解決上面的#3

HLL++ 函數是近似聚合函數。 與精確聚合函數（如 COUNT(DISTINCT)）相比，近似聚合通常需要更少的 memory，但也會引入統計錯誤。 這使得 HLL++ 函數適用於線性 memory 使用不切實際的大型數據流，以及已經近似的數據。

如何獲取累計用戶總數但忽略前一天已經出現的用戶？使用大查詢

問題描述

2 個解決方案

解決方案1
1 已采納 2022-08-30 10:26:38

解決方案2
0 2022-08-30 17:33:50

如何獲取累計用戶總數但忽略前一天已經出現的用戶？ 使用大查詢

問題描述

2 個解決方案

解決方案1 1 已采納 2022-08-30 10:26:38

解決方案2 0 2022-08-30 17:33:50

如何獲取累計用戶總數但忽略前一天已經出現的用戶？使用大查詢

解決方案1
1 已采納 2022-08-30 10:26:38

解決方案2
0 2022-08-30 17:33:50