简体   繁体   English

SQL 聚合多列查询

[英]SQL query for aggregation multiple columns

I would like to write a query in Presto SQL.我想在 Presto SQL 中编写一个查询。

The table:桌子:

words id1 id1 id2 id2 id2_like id2_like rank
baseball棒球 28 28 2756 2756 1. 1. 6 6个
baseball棒球 28 28 3180. 3180。 0. 0。 5 5个
baseball棒球 28. 28. 8161. 8161。 0. 0。 17 17
baseball棒球 11. 11. 1723 1723 0. 0。 22 22
baseball棒球 11. 11. 5329. 5329。 1. 1. 29 29
football.足球。 19. 19. 3210. 3210。 1. 1. 2 2个
football.足球。 19. 19. 5519 5519 0. 0。 18 18
football.足球。 19. 19. 6257 6257 1. 1. 3 3个

id2_like depends on id2 and it can only be 1 or 0. id2_like取决于id2 ,它只能是 1 或 0。

I would like to get some aggregation results from the above table within one SQL query.我想在一个 SQL 查询中从上表中获得一些聚合结果。

For each value in words , we need to get对于words中的每个值,我们需要得到

  1. Total number of id2_like = 1 id2_like = 1
  2. Percentage of id2_like as 0 out of total id2_like id2_like id2_like的百分比为 0
  3. Number of id1 where id2_like = 0 id2_like = 0id1的数量
  4. Average over id1 the max rank of id2_like = 0id1上平均id2_like = 0的最大排名
  5. Average percentage of id2 as 0 over id1 (in case some id2_like = 1 and some are 0) id2 为 0 超过id1的平均百分比(以防某些id2_like = 1而某些为 0)

I know how to develop query for each one but I am not sure how to get all of them within one single SQL query.我知道如何为每个查询开发查询,但我不确定如何在一个 SQL 查询中获取所有查询。

Expected results:预期成绩:

words.   id1_cnt_for_id2_as_1  perc_id2_as_0  id1_cnt_for_id2_as_0_perc.   max_rank_id2_as_0   avg_perc_id2_as_0
baseball     2                     3/5        2                                (17+22)/2               (2/3+1/2)/2   
  
football.  2.                       2/3.     1.                             18                    1/3.   
   

If I understand correctly here is what you want, however I didn't understand what you want for number 5如果我理解正确,这就是你想要的,但是我不明白你想要 5 号

select  words
    , sum(id1_cnt_for_id2_as_1) as id1_cnt_for_id2_as_1
    , sum(sum_perc_id2_as_0)* 100.0 /sum(cnt_perc_id2_as_0) as perc_id2_as_0
    , sum(id1_cnt_for_id2_as_0_perc) id1_cnt_for_id2_as_0_perc
    , avg(max_rank_id2_as_0) as max_rank_id2_as_0
    , avg(avg_perc_id2_as_0) as avg_perc_id2_as_0 
from (
select words
    , sum(id2_like) as id1_cnt_for_id2_as_1
    , sum(case when id2_like= 0 then 1 end) as sum_perc_id2_as_0
    , count(*)  as cnt_perc_id2_as_0
    , count(distinct case when id2_like =0 then id1 end) id1_cnt_for_id2_as_0_perc
    , sum(case when id2_like= 0 then rank end) as max_rank_id2_as_0
    , sum(case when id2_like= 0 then 1 end)* 100.0/count(*)  as avg_perc_id2_as_0
from data
group by words,id1
) t group by words

db<>fiddle here db<> 在这里摆弄

Hope it helps you to get some idea of what to do, tested in AWS Athena (pretty much like presto under the hood).希望它能帮助您了解要做什么,在 AWS Athena 中进行测试(非常类似于引擎盖下的 presto)。 Did not understood the fifth question.第五题没看懂。

SELECT 
        words, 
        item_1,  
        item_1 / CAST(size as decimal(10,4)) * 100 as item_2, 
        size - item_1 as item_3,
        max_rank as item_4
    FROM (
         SELECT 
              words, 
              SUM(id2_like) as item_1, 
              COUNT(*) as size,
              AVG(id1/CAST((SELECT MAX(rank) FROM tb WHERE id2_like = 0) as decimal(10,4))) as max_rank
         FROM tb 
         GROUP BY 1
         ) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM