[英]Aggregating items in aws athena sql
Given a database that looks like:给定一个看起来像这样的数据库:
item物品 | vietnamese越南语 | cost成本 | unique_id唯一身份 |
---|---|---|---|
fruits水果 | trai cay小岛 | 10 10 | abc123 abc123 |
fruits水果 | trai cay小岛 | 8 8个 | foo99富99 |
fruits水果 | trai cay小岛 | 9 9 | foo99富99 |
fruits水果 | trai cay小岛 | 12 12 | abc123 abc123 |
fruits水果 | trai cay小岛 | 14 14 | abc123 abc123 |
vege蔬菜 | rau劳 | 3 3个 | rr1239 rr1239 |
vege蔬菜 | rau劳 | 3 3个 | rr1239 rr1239 |
When querying through AWS Athena as such:通过 AWS Athena 查询时:
SELECT item,
sum(cost) as sum_cost,
avg(cost) as avg_cost,
array_agg(vietnamese) as vietnamese,
array_agg(cost) as costs,
array_agg(unique_id) as unique_ids
FROM foodtable
GROUP BY item
ORDER BY avg_cost
I'll get an array of repeated vietnamese translation:我会得到一系列重复的越南语翻译:
item物品 | vietnamese越南语 |
---|---|
fruits水果 | [trai cay, trai cay, trai cay, trai cay, trai cay] [trai cay, trai cay, trai cay, trai cay, trai cay] |
Is there a way to just keep the last/first value from the vietnamese
column?有没有办法只保留vietnamese
列中的最后一个/第一个值?
Also, with the query above, the unique_ids
value would look like:此外,对于上面的查询, unique_ids
值将如下所示:
item物品 | unique_ids unique_ids |
---|---|
fruits水果 | [abc123, foo99, foo99, abc123, abc123] [abc123, foo99, foo99, abc123, abc123] |
Is there a way to aggregate the counts and keep a counter column to achieve?有没有办法聚合计数并保留一个计数器列来实现?
item物品 | unique_ids unique_ids |
---|---|
fruits水果 | [abc123:3, foo99:2] [abc123:3, foo99:2] |
Currently, I've tried just reading the outputs after I get the results from the SQL query by unique-ing with set(vietnamese)
and collections.Counter(unique_ids)
.目前,我已经尝试通过使用set(vietnamese)
和collections.Counter(unique_ids)
进行 unique-ing 从 SQL 查询中获取结果后读取输出。 But if it's possible to do that in the SQL query, that'll more desirable.但是,如果可以在 SQL 查询中执行此操作,那就更可取了。
Athena has a many functions that operate on arrays, such as filter
, element_at
,cardinality
, reduce
, as well as functions that create and process maps . Athena 有很多函数在 arrays 上运行,例如filter
、 element_at
、cardinality
、 reduce
,以及创建和处理映射的函数。 You can use these to process the aggregated arrays.您可以使用这些来处理聚合的 arrays。
For example, to count the number of occurrences of each unique ID you can do something like this:例如,要计算每个唯一 ID 的出现次数,您可以这样做:
SELECT
item,
transform_values(multimap_agg(unique_id, 1), (k, v) -> cardinality(v))
GROUP BY item
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.