简体   繁体   English

在 aws athena 中聚合项目 sql

[英]Aggregating items in aws athena sql

Given a database that looks like:给定一个看起来像这样的数据库:

item物品 vietnamese越南语 cost成本 unique_id唯一身份
fruits水果 trai cay小岛 10 10 abc123 abc123
fruits水果 trai cay小岛 8 8个 foo99富99
fruits水果 trai cay小岛 9 9 foo99富99
fruits水果 trai cay小岛 12 12 abc123 abc123
fruits水果 trai cay小岛 14 14 abc123 abc123
vege蔬菜 rau 3 3个 rr1239 rr1239
vege蔬菜 rau 3 3个 rr1239 rr1239

When querying through AWS Athena as such:通过 AWS Athena 查询时:

SELECT item, 
    sum(cost) as sum_cost, 
    avg(cost) as avg_cost, 
    array_agg(vietnamese) as vietnamese,
    array_agg(cost) as costs,
    array_agg(unique_id) as unique_ids
FROM foodtable
GROUP BY item
ORDER BY avg_cost

I'll get an array of repeated vietnamese translation:我会得到一系列重复的越南语翻译:

item物品 vietnamese越南语
fruits水果 [trai cay, trai cay, trai cay, trai cay, trai cay] [trai cay, trai cay, trai cay, trai cay, trai cay]

Is there a way to just keep the last/first value from the vietnamese column?有没有办法只保留vietnamese列中的最后一个/第一个值?

Also, with the query above, the unique_ids value would look like:此外,对于上面的查询, unique_ids值将如下所示:

item物品 unique_ids unique_ids
fruits水果 [abc123, foo99, foo99, abc123, abc123] [abc123, foo99, foo99, abc123, abc123]

Is there a way to aggregate the counts and keep a counter column to achieve?有没有办法聚合计数并保留一个计数器列来实现?

item物品 unique_ids unique_ids
fruits水果 [abc123:3, foo99:2] [abc123:3, foo99:2]

Currently, I've tried just reading the outputs after I get the results from the SQL query by unique-ing with set(vietnamese) and collections.Counter(unique_ids) .目前,我已经尝试通过使用set(vietnamese)collections.Counter(unique_ids)进行 unique-ing 从 SQL 查询中获取结果后读取输出。 But if it's possible to do that in the SQL query, that'll more desirable.但是,如果可以在 SQL 查询中执行此操作,那就更可取了。

Athena has a many functions that operate on arrays, such as filter , element_at ,cardinality , reduce , as well as functions that create and process maps . Athena 有很多函数在 arrays 上运行,例如filterelement_atcardinalityreduce ,以及创建和处理映射的函数 You can use these to process the aggregated arrays.您可以使用这些来处理聚合的 arrays。

For example, to count the number of occurrences of each unique ID you can do something like this:例如,要计算每个唯一 ID 的出现次数,您可以这样做:

SELECT
  item,
  transform_values(multimap_agg(unique_id, 1), (k, v) -> cardinality(v))
GROUP BY item

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM