简体   繁体   English

如何获取累计用户总数但忽略前一天已经出现的用户? 使用大查询

[英]How to get cumulative total users but ignoring the users who already appear in previous day? using bigquery

So I want to calculate cumulative users per day but if the users exist is previous days they will not counted.所以我想计算每天的累积用户,但如果用户存在是前几天,他们将不计算在内。

date_key      user_id
2022-01-01     001
2022-01-01     002
2022-01-02     001
2022-01-02     003
2022-01-03     002
2022-01-03     003
2022-01-04     002
2022-01-04     004

on daily basis we can get每天我们可以得到

date_key     total_user
2022-01-01      2
2022-01-02      2
2022-01-03      2
2022-01-04      2

if we simply calculate cumulative we can get 2,4,6,8 for each day the goal is to get the table like this如果我们简单地计算累积,我们每天可以得到 2,4,6,8 目标是得到这样的表格

date_key     total_user
2022-01-01      2
2022-01-02      3
2022-01-03      3
2022-01-04      4

im using this query to get the result, since the data is really2 huge.我使用这个查询来获得结果,因为数据真的很大。 the query takes forever to complete.查询需要永远完成。

select b.date_key,count(distinct a.user_id) total_user
from t1 a
join t1 b 
   on b.date_key >= a.date_key 
   and date_trunc(a.date_key,month) = date_trunc(b.date_key,month)
group by 1
order by 1

and yes the calculation should be on reset when the month is changing.是的,当月份变化时,计算应该重置。

and btw I'm using google bigquery顺便说一句,我正在使用谷歌 bigquery

Number each user's appearance by order of date.按日期顺序对每个用户的外观进行编号。 Count only the ones seen for the first time:只计算第一次看到的那些:

with data as (
    select *,
        row_number() over (partition by date_trunc(date_key, month), userid
                           order by date_key) as rn
    from T
)
select date_key,
    sum(count(case when rn = 1 then 1 end)) -- or countif(rn = 1)
        over (partition by date_trunc(date_key, month)
              order by date_key) as cum_monthly_users
from data
group by date_key;

https://dbfiddle.uk/?rdbms=postgres_14&fiddle=dc426d79a7786fc8a5b25a22f0755e27 https://dbfiddle.uk/?rdbms=postgres_14&fiddle=dc426d79a7786fc8a5b25a22f0755e27

  1. cumulative total users but ignoring the users who already appear in previous day?累计用户总数,但忽略前一天已经出现的用户?
  2. the calculation should be on reset when the month is changing当月份变化时,计算应该重置
  3. the data is really2 huge数据真的很大

Consider below approach考虑以下方法

select date_key, 
  ( select hll_count.merge(u) 
    from unnest(users) u
  ) as total_user
from (
  select date_key, date_trunc(date(date_key), month) year_month,
    array_agg(users) over(partition by date_trunc(date(date_key), month) order by date_key) users
  from (
    select date_key, hll_count.init(user_id) users
    from your_table
    group by date_key
  )
)      

if applied to sample data in your question - output is如果应用于您问题中的示例数据 - output 是

在此处输入图像描述

Note: not [obviously] above ##1 and 2 are met - and output as expected, but also here we use HyperLogLog++ functions which will effectivelly address above #3注意:没有[明显]上面的##1和2被满足-和output如预期的那样,但在这里我们使用HyperLogLog++函数将有效地解决上面的#3

HLL++ functions are approximate aggregate functions. HLL++ 函数是近似聚合函数。 Approximate aggregation typically requires less memory than exact aggregation functions, like COUNT(DISTINCT), but also introduces statistical error.与精确聚合函数(如 COUNT(DISTINCT))相比,近似聚合通常需要更少的 memory,但也会引入统计错误。 This makes HLL++ functions appropriate for large data streams for which linear memory usage is impractical, as well as for data that is already approximate.这使得 HLL++ 函数适用于线性 memory 使用不切实际的大型数据流,以及已经近似的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 谁支付 BigQuery 中跨多个用户的数据集的查询费用? - Who pays query costs on datasets across multiple users in BigQuery? 检索使用 Firebase Auth 注册的用户列表 - Retrieving a list of users who have registered using Firebase Auth 如何使用 twilio 从我的站点中的用户获取消息内容 - How to get content of message from users in my site using twilio 如何使用 javascript 从 firebase 获取活跃用户列表? - How to get active users list from firebase using javascript? 如何从firebase获取用户列表 - how to get a list of users from firebase 如何标记累计总收入的百分比 - how to flag % revenue from a cumulative total 在 Bigquery 中计算每天总记录数和每天具有相同时间时间戳和 id 的总记录数的查询 - Query that counts total records per day and total records with same time timestamp and id per day in Bigquery SQL - 查找 2020 年 2 月购买的用户百分比 - SQL - Find % of users who made purchase in Feb 2020 Firebase 查询所有非好友用户 - flutter - Firebase query all users who are not friends with user - flutter 如何在 python 中使用 presignedurl 将用户(csv)导入 AWS cognito - How to import users(csv) to AWS cognito using presignedurl in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM