繁体   English   中英

基于多列的雪花 SQL 聚合

[英]Snowflake SQL aggregate based on multiple columns

我有 2 个用户 ID 和电子邮件表。 用户可以更改他们的电子邮件但保持相同的用户 ID(USER_PLAYS 表的第 2 行和第 5 行)。 用户还可以使用现有电子邮件(USER_PLAYS 表的第 3 行)创建新用户 ID。 我希望能够将此用户的总播放次数汇总为一行。 还有另一个带有销售价值的表格,我想获得总销售额。 我正在考虑以某种方式创建一个在所有这些字段中都相同的唯一 ID,但不确定如何实现它。

请注意,我只展示了 1 个真实的人,但这些表格中还有更多独特的人。

我正在使用雪花,因为那是数据所在的位置。

USER_PLAYS table:

|ROW|USER_ID    | EMAIL              |VIDEO_PLAYS|
|---|-----------|--------------------|-----------|
|1  | 1         |  ab@gmail.com      |    2      |
|2  | 1         |  cd@gmail.com      |    3      |
|3  | 3         |  cd@gmail.com      |    4      |
|4  | 4         |  cd@gmail.com      |    2      |
|5  | 4         |  ef@gmail.com      |    3      |

Sales Table:
|NET_SALE   | EMAIL       |
|-----------|-------------|
|5          | cd@gmail.com|
|10         | ef@gmail.com|

Desired Output:
|UNIQUE_ID  | PLAYS |NET_SALE|
|-----------|-------|--------|
| 1         |  14   |  15    |

这可能有机会提高效率,但我认为这个过程可以让你在你的 user_id / email 组合中获得唯一标识符。

对于这个过程,我在 user_plays 表中添加了另一个名为 COMMON_ID 的列。 这通过 email_id 与 NET_SALES 表连接,可以根据 COMMON_ID 聚合到销售额(见下面的结果):

-- Create the test case
create
or replace table user_plays (
    user_id varchar not null,
    email varchar not null,
    video_plays integer not null,
    common_id integer default NULL
);
insert into
    user_plays
values
    (1, 'ab@gmail.com', 2, null),
    (1, 'cd@gmail.com', 3, null),
    (3, 'cd@gmail.com', 4, null),
    (4, 'cd@gmail.com', 2, null),
    (4, 'ef@gmail.com', 3, null),
    (5, 'jd@gmail.com', 10, null),
    (6, 'lk@gmail.com', 1, null),
    (6, 'zz@gmail.com', 2, null),
    (7, 'zz@gmail.com', 3, null);
create
    or replace table sales (net_sale integer, email varchar);
insert into
    sales
values
    (5, 'cd@gmail.com'),(10, 'ef@gmail.com');
    -- Test run
    -- Create view for User IDs with multiple emails
    create
    or replace view grp1 as (
        select
            user_id,
            count(*) as mult
        from
            user_plays
        group by
            user_id
        having
            count(*) > 1
    );
    -- Create view for Emails with multiple user IDs
    create
    or replace view grp2 as (
        select
            email,
            count(*) as mult
        from
            user_plays x
        group by
            email
        having
            count(*) > 1
    );
EXECUTE IMMEDIATE $$
declare new_common_id integer;
counter integer;


Begin 
counter := 0;
new_common_id := 0;

-- Basline common_id to NULL
update
    user_plays
set
    common_id = NULL;
    
-- Mark all unique entries with a common_id = user_id
update
    user_plays
set
    common_id = user_id
where
    email not in (
        select
            distinct email
        from
            grp2
    )
    and user_id not in (
        select
            distinct user_id
        from
            grp1
    );    

-- Set a common_id to the lowest user_id value for each user_id with multiple emails
LOOP
select count(*) into :counter
from
    user_plays
where
    common_id is null;
if (counter = 0) then BREAK;
end if;
select
    min(user_id) into :new_common_id
from
    user_plays
where
    common_id is null;
-- first pass
update
    user_plays
set
    common_id = :new_common_id
where
    common_id is null and 
    (user_id = :new_common_id
    or email in (
        select
            email
        from
            user_plays
        where
            user_id = :new_common_id
    ));

END LOOP;

-- Update the chain where an account using a changed email created a new user_id to match up with prior group.

UPDATE user_plays vp
 set vp.common_id = vp2.common_id
 from (select user_id, min(common_id) as common_id from user_plays group by user_id) vp2
 where vp.user_id = vp2.user_id;

END;
$$;
-- See results
select
    *
from
    user_plays;
select
    x.common_id,
    vps.video_plays,
    sum(x.net_sale) as net_sale
from
    (
        select
            common_id,
            sum(video_plays) as video_plays
        from
            user_plays
        group by
            common_id
    ) vps,
    (
        select
            s.email,
            s.net_sale,
            max(up.common_id) as common_id
        from
            sales s,
            user_plays up
        where
            up.email = s.email
        group by
            s.email,
            s.net_sale
    ) x
where
    vps.common_id = x.common_id
group by
    x.common_id,
    vps.video_plays;

常见 ID 分配结果:

USER_ID EMAIL           VIDEO_PLAYS COMMON_ID
1       ab@gmail.com    2           1
1       cd@gmail.com    3           1
3       cd@gmail.com    4           1
4       cd@gmail.com    2           1
4       ef@gmail.com    3           1
5       jd@gmail.com    10          5
6       lk@gmail.com    1           6
6       zz@gmail.com    2           6
7       zz@gmail.com    3           6

最终结果:

COMMON_ID   VIDEO_PLAYS NET_SALE
1           14          15

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM