简体   繁体   English

按两种日期计算 Distinct ID 组

[英]Count Distinct ID group by two kind of dates

With the following table, I need to count the number of disctinct ID every month with a rolling period of 30 days, group by the month of the opening account使用下表,我需要统计每个月不同ID的数量,滚动周期为30天,按开户月份分组

CREATE TABLE test (
  opening_account DATE,
  activity_date DATE,
  ID VARCHAR
);

INSERT INTO test (opening_account, activity_date,ID) VALUES 
('2022-01-01', '2022-01-01', '2DKJZINK'),
('2022-01-01', '2022-01-14', '2DKJZINK'),
('2022-01-01', '2022-01-24', '2DKJZINK'),
('2022-01-01', '2022-02-02', '2DKJZINK'),
('2022-01-01', '2022-02-04', '2DKJZINK'),
('2022-01-01', '2022-01-04', '3EFE'),
('2022-01-02', '2022-01-30', 'HZKZ'),
('2022-01-02', '2022-02-04', 'HZKZ'),
('2022-01-02', '2022-03-12', 'HZKZ'),
('2022-02-03', '2022-02-03', 'KDZL'),
('2022-02-03', '2022-03-03', 'KDZL'),
('2022-02-03', '2022-03-03', 'KDZL'),
('2022-02-12', '2022-02-14', 'ZOJZO'),
('2022-03-22', '2022-03-22', 'DZJA'),
('2022-03-22', '2022-03-22', 'DZAAA');

For example:例如:

  • Looking at the month of January, 3 people opened an account:2DKJZINK, 3EFE, HZKZ.看1月份,3个人开户:2DKJZINK、3EFE、HZKZ。 They all have been actived in January, so the total is 3.他们都在 1 月份活跃,所以总数是 3。

  • In February, only 2DKJZINK & HZKZ have been actived, but because there is a rolling period of 30days/1month, we need to considered 3EFE which was actived the 4th of January. 2月份只有2DKJZINK和HZKZ被激活,但是因为有30天/1个月的滚动周期,我们需要考虑1月4日激活的3EFE。 So the total for the cohort of January for February is 3所以一月到二月的队列总数是 3

  • In March, only HZKZ has been actived. 3月,只有HZKZ被激活。 But because in February 2DKJZINK was also actived, the total will be 2.但是因为2月2DKJZINK也被激活了,所以总数会是2。

This is the same process for the people that opened an account in February, then March...对于在二月开户的人来说,这是相同的过程,然后是三月......

Excepted result is the following.异常结果如下。 The table doesn't have to be pivoted like this, it is mostly to explain the result easier.表格不必像这样旋转,主要是为了更容易解释结果。

Month_opening_account   January_activity    February_activity   March_activity
January                        3                   3                    2
February                                           2                    2
March                                                                   2

A basic query will give me the count but without taking the rolling period of 30 days into account.一个基本查询会给我计数,但不考虑 30 天的滚动期。 I have trying to add a window function on top of it, but by doing like this, there is duplicated value.我试图在它上面添加一个窗口函数,但是这样做会产生重复的值。

select
 FORMAT_DATE('%B', activity_date ) AS month_activity,
 FORMAT_DATE('%B', opening_account ) as month_opening_account,
 count(distinct ID) as count_id_previous
from test
group by 1,2

Any help would be appreciated任何帮助,将不胜感激

Consider below approach考虑以下方法

with temp as (
  select id,
    format_date('%b', date_trunc(date(opening_account), month)) as month_opening_account,
    format_date('%b', date_trunc(date(activity_date), month)) as activity_date,
    format_date('%b', date_trunc(date_add(date(activity_date), interval 1 month), month)) as rolling_activity_date
  from your_table
)
select * from (
  select id, month_opening_account, activity_date from temp union all
  select id, month_opening_account, rolling_activity_date as activity_date from temp
)
pivot (count(distinct id) as activity for activity_date in ('Jan','Feb','Mar','Apr','May','Jun','Jul'))             

if applied to sample data in your question - output is如果应用于您问题中的样本数据 - 输出是

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM