简体   繁体   English

AWS athena中的SQL下面的output如何计算

[英]How to calculate below output in the SQL in AWS athena

I have the below result set.我有以下结果集。

with dataset AS (
    select 1 as total_users, ARRAY['google', 'meta', 'attentive', 'meta'] as path_list
    UNION ALL
    select 1, ARRAY['google', 'attentive', 'Direct Traffic', 'Direct Traffic', 'Direct Traffic', 'meta']
    UNION ALL
    select 4 , ARRAY ['google','meta', 'google']
    UNION ALL
    select 1, ARRAY['google', 'meta', 'meta', 'Direct Traffic' , 'meta']
    UNION ALL
    select 1, ARRAY['google', 'meta', 'meta']
    UNION ALL
    select 1, ARRAY['google', 'Direct Traffic', 'Direct Traffic','attentiva', 'attentiva', 'attentiva', 'Direct Traffic', 'meta']
)

SELECT path_list, total_users, path_list[2] as second_click, 
CASE WHEN CARDINALITY(path_list) > 2 THEN path_list[3] ELSE NULL END as third_click from dataset

The total_users column indicate number of users who traversed a particular path. total_users 列指示遍历特定路径的用户数。

I want to calculate the following output.我想计算以下 output。

  • Second_click medium第二次点击媒体
  • second_click_percentage second_click_percentage
  • third_click第三次点击
  • third_click_percentage third_click_percentage

for each medium.对于每种介质。

The result set looks as below结果集如下所示

path_list total_users second_click third_click path_list total_users second_click third_click

[google, Direct Traffic, Direct Traffic, attentiva, attentiva, attentiva, Direct Traffic, meta] [谷歌,直接流量,直接流量,attentiva,attentiva,attentiva,直接流量,元]
1 Direct Traffic Direct Traffic 1 直接流量 直接流量

[google, meta, attentive, meta] [谷歌,元,细心,元]
1 meta attentive 1元细心

[google, attentive, Direct Traffic, Direct Traffic, Direct Traffic, meta] [谷歌,细心,直接流量,直接流量,直接流量,元]
1 attentive Direct Traffic 1 周到的直接流量

[google, meta, meta, Direct Traffic, meta] [谷歌、元、元、直接流量、元]
1 meta meta 1 元元

[google, meta, meta] [谷歌,元,元]
1 meta meta 1 元元

[google, meta, google] [谷歌,元,谷歌]
4 meta google 4 元谷歌

Now I need to calculate how users' percentage value that how many had meta as second click, attentive as second click.现在我需要计算用户的百分比值,即有多少用户将元作为第二次点击,将关注作为第二次点击。

Similarly need to identity third click percentage values for different mediums.同样需要确定不同媒体的第三次点击百分比值。

What can I do to write a SQL solution?写一个SQL解决方案应该怎么做?

You can group data and calculate the percentages:您可以对数据进行分组并计算百分比:

-- sample data
with dataset(total_users, path_list) AS (
    values (1, ARRAY['google', 'meta', 'attentive', 'meta']),
    (1, ARRAY['google', 'attentive', 'Direct Traffic', 'Direct Traffic', 'Direct Traffic', 'meta']),
    (4, ARRAY ['google','meta', 'google']),
    (1, ARRAY['google', 'meta', 'meta', 'Direct Traffic' , 'meta']),
    (1, ARRAY['google', 'meta', 'meta']),
    (1, ARRAY['google', 'Direct Traffic', 'Direct Traffic','attentiva', 'attentiva', 'attentiva', 'Direct Traffic', 'meta'])
),

-- query parts
clicks as (
SELECT total_users,
       path_list[2] as second_click,
       try(path_list[2]) as third_click,
       sum(total_users) over () as sum_total
from dataset)

select second_click, sum(total_users) * 1.0 / arbitrary(sum_total)
from clicks
group by second_click;

Output: Output:

second_click第二次点击 _col1 _col1
meta 0.8 0.8
Direct Traffic直接交通 0.1 0.1
attentive细心 0.1 0.1

If you want both groupings returned in one query you can look into grouping sets:如果您希望在一个查询中返回两个分组,您可以查看分组集:

select second_click, third_click, sum(total_users) * 1.0 / arbitrary(sum_total)
from clicks
group by GROUPING SETS (
    (second_click),
    (third_click));

But I would argue that this will not be that convenient.但我认为这不会那么方便。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM