简体   繁体   中英

Snowflake SQL aggregate based on multiple columns

I've got 2 tables of User ID's and emails. A user can change their email but keep the same user ID (row 2 and row 5 of USER_PLAYS table). A user can also create a new user ID with an existing email (row 3 of USER_PLAYS table). I want to be able to sum up the total plays for this user into a single row. There is also another table with sales value that I would like to get the total sales. I'm thinking somehow to create a unique ID that is the same across all these fields but not sure how to implement it.

Note that I've only shown 1 actual person but there are multiple more unique people in these tables.

I am using Snowflake as that is where the data is.

USER_PLAYS table:

|ROW|USER_ID    | EMAIL              |VIDEO_PLAYS|
|---|-----------|--------------------|-----------|
|1  | 1         |  ab@gmail.com      |    2      |
|2  | 1         |  cd@gmail.com      |    3      |
|3  | 3         |  cd@gmail.com      |    4      |
|4  | 4         |  cd@gmail.com      |    2      |
|5  | 4         |  ef@gmail.com      |    3      |

Sales Table:
|NET_SALE   | EMAIL       |
|-----------|-------------|
|5          | cd@gmail.com|
|10         | ef@gmail.com|

Desired Output:
|UNIQUE_ID  | PLAYS |NET_SALE|
|-----------|-------|--------|
| 1         |  14   |  15    |

This may have opportunities for additional efficiencies, but I think this process works to get you the unique identifier across your user_id / email combinations.

For this process I added another column called COMMON_ID to the user_plays table. This joined with the NET_SALES table by email_id, can be aggregated to the sales against the COMMON_ID (see results below):

-- Create the test case
create
or replace table user_plays (
    user_id varchar not null,
    email varchar not null,
    video_plays integer not null,
    common_id integer default NULL
);
insert into
    user_plays
values
    (1, 'ab@gmail.com', 2, null),
    (1, 'cd@gmail.com', 3, null),
    (3, 'cd@gmail.com', 4, null),
    (4, 'cd@gmail.com', 2, null),
    (4, 'ef@gmail.com', 3, null),
    (5, 'jd@gmail.com', 10, null),
    (6, 'lk@gmail.com', 1, null),
    (6, 'zz@gmail.com', 2, null),
    (7, 'zz@gmail.com', 3, null);
create
    or replace table sales (net_sale integer, email varchar);
insert into
    sales
values
    (5, 'cd@gmail.com'),(10, 'ef@gmail.com');
    -- Test run
    -- Create view for User IDs with multiple emails
    create
    or replace view grp1 as (
        select
            user_id,
            count(*) as mult
        from
            user_plays
        group by
            user_id
        having
            count(*) > 1
    );
    -- Create view for Emails with multiple user IDs
    create
    or replace view grp2 as (
        select
            email,
            count(*) as mult
        from
            user_plays x
        group by
            email
        having
            count(*) > 1
    );
EXECUTE IMMEDIATE $$
declare new_common_id integer;
counter integer;


Begin 
counter := 0;
new_common_id := 0;

-- Basline common_id to NULL
update
    user_plays
set
    common_id = NULL;
    
-- Mark all unique entries with a common_id = user_id
update
    user_plays
set
    common_id = user_id
where
    email not in (
        select
            distinct email
        from
            grp2
    )
    and user_id not in (
        select
            distinct user_id
        from
            grp1
    );    

-- Set a common_id to the lowest user_id value for each user_id with multiple emails
LOOP
select count(*) into :counter
from
    user_plays
where
    common_id is null;
if (counter = 0) then BREAK;
end if;
select
    min(user_id) into :new_common_id
from
    user_plays
where
    common_id is null;
-- first pass
update
    user_plays
set
    common_id = :new_common_id
where
    common_id is null and 
    (user_id = :new_common_id
    or email in (
        select
            email
        from
            user_plays
        where
            user_id = :new_common_id
    ));

END LOOP;

-- Update the chain where an account using a changed email created a new user_id to match up with prior group.

UPDATE user_plays vp
 set vp.common_id = vp2.common_id
 from (select user_id, min(common_id) as common_id from user_plays group by user_id) vp2
 where vp.user_id = vp2.user_id;

END;
$$;
-- See results
select
    *
from
    user_plays;
select
    x.common_id,
    vps.video_plays,
    sum(x.net_sale) as net_sale
from
    (
        select
            common_id,
            sum(video_plays) as video_plays
        from
            user_plays
        group by
            common_id
    ) vps,
    (
        select
            s.email,
            s.net_sale,
            max(up.common_id) as common_id
        from
            sales s,
            user_plays up
        where
            up.email = s.email
        group by
            s.email,
            s.net_sale
    ) x
where
    vps.common_id = x.common_id
group by
    x.common_id,
    vps.video_plays;

Common ID assignment Results:

USER_ID EMAIL           VIDEO_PLAYS COMMON_ID
1       ab@gmail.com    2           1
1       cd@gmail.com    3           1
3       cd@gmail.com    4           1
4       cd@gmail.com    2           1
4       ef@gmail.com    3           1
5       jd@gmail.com    10          5
6       lk@gmail.com    1           6
6       zz@gmail.com    2           6
7       zz@gmail.com    3           6

Final Results:

COMMON_ID   VIDEO_PLAYS NET_SALE
1           14          15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM