[英]Snowflake SQL aggregate based on multiple columns
我有 2 个用户 ID 和电子邮件表。 用户可以更改他们的电子邮件但保持相同的用户 ID(USER_PLAYS 表的第 2 行和第 5 行)。 用户还可以使用现有电子邮件(USER_PLAYS 表的第 3 行)创建新用户 ID。 我希望能够将此用户的总播放次数汇总为一行。 还有另一个带有销售价值的表格,我想获得总销售额。 我正在考虑以某种方式创建一个在所有这些字段中都相同的唯一 ID,但不确定如何实现它。
请注意,我只展示了 1 个真实的人,但这些表格中还有更多独特的人。
我正在使用雪花,因为那是数据所在的位置。
USER_PLAYS table:
|ROW|USER_ID | EMAIL |VIDEO_PLAYS|
|---|-----------|--------------------|-----------|
|1 | 1 | ab@gmail.com | 2 |
|2 | 1 | cd@gmail.com | 3 |
|3 | 3 | cd@gmail.com | 4 |
|4 | 4 | cd@gmail.com | 2 |
|5 | 4 | ef@gmail.com | 3 |
Sales Table:
|NET_SALE | EMAIL |
|-----------|-------------|
|5 | cd@gmail.com|
|10 | ef@gmail.com|
Desired Output:
|UNIQUE_ID | PLAYS |NET_SALE|
|-----------|-------|--------|
| 1 | 14 | 15 |
这可能有机会提高效率,但我认为这个过程可以让你在你的 user_id / email 组合中获得唯一标识符。
对于这个过程,我在 user_plays 表中添加了另一个名为 COMMON_ID 的列。 这通过 email_id 与 NET_SALES 表连接,可以根据 COMMON_ID 聚合到销售额(见下面的结果):
-- Create the test case
create
or replace table user_plays (
user_id varchar not null,
email varchar not null,
video_plays integer not null,
common_id integer default NULL
);
insert into
user_plays
values
(1, 'ab@gmail.com', 2, null),
(1, 'cd@gmail.com', 3, null),
(3, 'cd@gmail.com', 4, null),
(4, 'cd@gmail.com', 2, null),
(4, 'ef@gmail.com', 3, null),
(5, 'jd@gmail.com', 10, null),
(6, 'lk@gmail.com', 1, null),
(6, 'zz@gmail.com', 2, null),
(7, 'zz@gmail.com', 3, null);
create
or replace table sales (net_sale integer, email varchar);
insert into
sales
values
(5, 'cd@gmail.com'),(10, 'ef@gmail.com');
-- Test run
-- Create view for User IDs with multiple emails
create
or replace view grp1 as (
select
user_id,
count(*) as mult
from
user_plays
group by
user_id
having
count(*) > 1
);
-- Create view for Emails with multiple user IDs
create
or replace view grp2 as (
select
email,
count(*) as mult
from
user_plays x
group by
email
having
count(*) > 1
);
EXECUTE IMMEDIATE $$
declare new_common_id integer;
counter integer;
Begin
counter := 0;
new_common_id := 0;
-- Basline common_id to NULL
update
user_plays
set
common_id = NULL;
-- Mark all unique entries with a common_id = user_id
update
user_plays
set
common_id = user_id
where
email not in (
select
distinct email
from
grp2
)
and user_id not in (
select
distinct user_id
from
grp1
);
-- Set a common_id to the lowest user_id value for each user_id with multiple emails
LOOP
select count(*) into :counter
from
user_plays
where
common_id is null;
if (counter = 0) then BREAK;
end if;
select
min(user_id) into :new_common_id
from
user_plays
where
common_id is null;
-- first pass
update
user_plays
set
common_id = :new_common_id
where
common_id is null and
(user_id = :new_common_id
or email in (
select
email
from
user_plays
where
user_id = :new_common_id
));
END LOOP;
-- Update the chain where an account using a changed email created a new user_id to match up with prior group.
UPDATE user_plays vp
set vp.common_id = vp2.common_id
from (select user_id, min(common_id) as common_id from user_plays group by user_id) vp2
where vp.user_id = vp2.user_id;
END;
$$;
-- See results
select
*
from
user_plays;
select
x.common_id,
vps.video_plays,
sum(x.net_sale) as net_sale
from
(
select
common_id,
sum(video_plays) as video_plays
from
user_plays
group by
common_id
) vps,
(
select
s.email,
s.net_sale,
max(up.common_id) as common_id
from
sales s,
user_plays up
where
up.email = s.email
group by
s.email,
s.net_sale
) x
where
vps.common_id = x.common_id
group by
x.common_id,
vps.video_plays;
常见 ID 分配结果:
USER_ID EMAIL VIDEO_PLAYS COMMON_ID
1 ab@gmail.com 2 1
1 cd@gmail.com 3 1
3 cd@gmail.com 4 1
4 cd@gmail.com 2 1
4 ef@gmail.com 3 1
5 jd@gmail.com 10 5
6 lk@gmail.com 1 6
6 zz@gmail.com 2 6
7 zz@gmail.com 3 6
最终结果:
COMMON_ID VIDEO_PLAYS NET_SALE
1 14 15
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.