简体   繁体   English

athena 上的复杂 SQL 查询聚合和分组

[英]Complex SQL query aggregation and grouping on athena

I have a table like this:我有一张这样的表:

|     db      |      chat_id      |   Admin    |     user      |
+-------------+-------------------+------------+---------------+
|    db_1     |      chat_id1     |    max     |     greg      |
|    db_1     |      chat_id2     |    max     |     bob       |
|    db_1     |      chat_id3     |    max     |     greg      |
|    db_1     |      chat_id2     |    helen   |     greg      |
|    db_2     |      chat_id1     |    alan    |     greg      |

I would like to retrieve the number of chat performed by users for each database (db) and the last part where I fail, retrieve also a list of all mentors by users .我想检索用户为每个数据库 (db)执行的聊天次数以及我失败的最后一部分,还检索用户所有导师的列表

The final output should be like this for example (notice there is only one time max for greg in the admin column)例如,最终输出应该是这样的(请注意,管理列中的 greg 只有一次最大时间)

|     db      |      user     |  nb_of_chat  |     admins    |
+-------------+---------------+--------------+---------------+
|    db_1     |      greg     |      3       |   max, helen  |
|    db_1     |      bob      |      1       |      max      |
|    db_2     |      greg     |      1       |      alan     |

I wrote the following query but it doesn't aggregate the admins and i have separated nb_of chats/mentors.我编写了以下查询,但它没有汇总管理员,并且我已将 nb_of 聊天/导师分开。

SELECT db, user, COUNT(chat_id), admins
FROM "chat_db"."chats" 
GROUP BY db, user, admins;

As expected I am getting the following result (but I only want it on one line by db/user with grouped admin in the same column):正如预期的那样,我得到以下结果(但我只希望 db/user 在同一列中与分组管理员在一行上):

|     db      |      user     |  nb_of_chat  |     admins    |
+-------------+---------------+--------------+---------------+
|    db_1     |      greg     |      2       |       max     |
|    db_1     |      greg     |      1       |      helen    |
|    ...      |      ...      |     ...      |      ...      |

Have you an idea how to perform it ?你知道如何执行它吗?

Thank you for your time !感谢您的时间 !

Regards.问候。

Try using array_agg() :尝试使用array_agg()

select db, user, count(chat_id), array_agg(admins)
from  "chat_db"."chats" 
group by db, user;

If you want one row per db :如果你想要每个db一行:

select db, count(*) as num_chats, count(distinct user) as num_users, array_agg(admins)
from  "chat_db"."chats" 
group by db;

Firsly, remove admins from the group by clause, since you want to aggregate it.首先,从group by子句中删除admins ,因为您想要聚合它。 Then, in Presto, you can do string aggregation as follows:然后,在 Presto 中,您可以按如下方式进行字符串聚合:

select db,user, count(*) no_of_chats, 
    array_join(array_agg(admins), ', ') all_admins
from  "chat_db"."chats" 
group by db, user;

You can add an order by clause to array_agg() if needed:如果需要,您可以向array_agg()添加order by子句:

select db,user, count(*) no_of_chats, 
    array_join(array_agg(admins order by admins), ', ') all_admins
from  "chat_db"."chats" 
group by db, user;

Note that I changed count(chat_id) to count(*) : both are equivalent (since chat_id probably is a non- null able column), and the former is (sligthly) more efficient, and makes the intent clearer in my opinion.请注意,我将count(chat_id)更改为count(*) :两者是等效的(因为chat_id可能是一个不可为null列),前者(稍微)更有效,并且在我看来使意图更清晰。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM