簡體   English   中英

窗口函數中的不同。 BigQuery的

[英]Distinct in Window Functions. BigQuery

我正在嘗試在BigQuery COUNT(DISTINCT user_id) OVER (PARTITION BY DATE_TRUNC(date, month), sample, app_id ORDER BY DATE RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as ACTIVE_USERS

換句話說,我有一個包含日期,用戶標識,示例和應用程序標識的表。 我需要計算從月初開始到當天結束的每一天的唯一活動用戶的累計數量。

該功能正常運行,沒有明顯的區別,但是,這給了我用戶總數,這不是我所需要的。

使用density_rank嘗試了一些技巧,但是在這里也不起作用。

有什么方法可以使用窗口函數來計算不同用戶的數量?

------------- UPDATED ----------------這是完整的查詢,因此您可以更好地了解我的需求

    with mtd1 as (select  
'MonthToDate' as TIMELINE
,fd.date DATE
,td.SAMPLE as SAMPLE
,td.APPNAME as APP_ID 
,sum(fd.revenue) as REVENUE 
,td.user_id ACTIVE_USERS 
from DWH.DailyUser fd 
join DWH.Depositors td using (userid)
group by 1,2,3,4,6
),
mtd as (
select TIMELINE
,DATE
,SAMPLE
,APP_ID
,sum(revenue) over (partition by date_trunc(date, month), sample, app_id order by date range BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as REVENUE
,COUNT(distinct active_users) over (partition by date_trunc(date, month), sample, app_id order by date range BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as ACTIVE_USERS 
from mtd1
)
select * from mtd 
where extract(day from date) = extract(day from current_date)
group by 1,2,3,4,5,6 

您可以使用ARRAY_AGG ,然后計算每個數組中的不同元素。 請注意,但是如果數組太大,查詢將耗盡內存。

with mtd1 as (select  
'MonthToDate' as TIMELINE
,fd.date DATE
,td.SAMPLE as SAMPLE
,td.APPNAME as APP_ID 
,sum(fd.revenue) as REVENUE 
,td.user_id ACTIVE_USERS 
from DWH.DailyUser fd 
join DWH.Depositors td using (userid)
group by 1,2,3,4,6
),
mtd1 as (
select TIMELINE
,DATE
,SAMPLE
,APP_ID
,sum(revenue) over (partition by date_trunc(date, month), sample, app_id order by date range BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as REVENUE
,ARRAY_AGG(active_users) over (partition by date_trunc(date, month), sample, app_id order by date range BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as ACTIVE_USERS 
from mtd1
), mtd AS (
  SELECT * EXCEPT(ACTIVE_USERS),
    (SELECT COUNT(DISTINCT u) FROM UNNEST(ACTIVE_USERS) AS u) AS ACTIVE_USERS
   FROM mtd1
)
select * from mtd 
where extract(day from date) = extract(day from current_date)
group by 1,2,3,4,5,6

窗口函數中的不同。 BigQuery-有什么方法可以使用窗口函數來計算不同用戶的數量?

這個特定的問題是重復的,已經here回答

...這是完整的查詢...

從上方如何將其應用於特定查詢-參見下文(未經測試,完全基於您的代碼

#standardSQL
WITH mtd1 AS (
  SELECT  
    'MonthToDate' AS TIMELINE
    ,fd.date DATE
    ,td.SAMPLE AS SAMPLE
    ,td.APPNAME AS APP_ID 
    ,SUM(fd.revenue) AS REVENUE 
    ,td.user_id ACTIVE_USERS 
  FROM `DWH.DailyUser` fd 
  JOIN `DWH.Depositors` td USING (userid)
  GROUP BY 1,2,3,4,6
), mtd2 AS (
  SELECT 
    TIMELINE
    ,DATE
    ,SAMPLE
    ,APP_ID
    ,SUM(REVENUE) OVER (PARTITION BY DATE_TRUNC(DATE, MONTH), SAMPLE, APP_ID ORDER BY DATE RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS REVENUE
    ,ARRAY_AGG(ACTIVE_USERS) OVER (PARTITION BY DATE_TRUNC(DATE, MONTH), SAMPLE, APP_ID ORDER BY DATE RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ACTIVE_USERS 
  FROM mtd1
), mtd AS (
  SELECT * REPLACE((SELECT COUNT(DISTINCT u) FROM UNNEST(ACTIVE_USERS) AS u) AS ACTIVE_USERS)
  FROM mtd2
)
SELECT * FROM mtd 
WHERE EXTRACT(day FROM DATE) = EXTRACT(day FROM CURRENT_DATE)
GROUP BY 1,2,3,4,5,6

一種實現count(distinct)方法是使用row_number()然后對“ 1”進行計數:

select SUM(CASE WHEN seqnum = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY DATE_TRUNC(date, month), sample, app_id ORDER BY date) as Active_Users
FROM (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY DATE_TRUNC(date, month), sample, app_id, user_id ORDER BY DATE) as seqnum
      FROM t
     ) t

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM