在 aws athena 中聚合项目 sql

Question

Given a database that looks like:给定一个看起来像这样的数据库：

item物品	vietnamese越南语	cost成本	unique_id唯一身份
fruits水果	trai cay小岛	10 10	abc123 abc123
fruits水果	trai cay小岛	8 8个	foo99富99
fruits水果	trai cay小岛	9 9	foo99富99
fruits水果	trai cay小岛	12 12	abc123 abc123
fruits水果	trai cay小岛	14 14	abc123 abc123
vege蔬菜	rau劳	3 3个	rr1239 rr1239
vege蔬菜	rau劳	3 3个	rr1239 rr1239

When querying through AWS Athena as such:通过 AWS Athena 查询时：

SELECT item, 
    sum(cost) as sum_cost, 
    avg(cost) as avg_cost, 
    array_agg(vietnamese) as vietnamese,
    array_agg(cost) as costs,
    array_agg(unique_id) as unique_ids
FROM foodtable
GROUP BY item
ORDER BY avg_cost

I'll get an array of repeated vietnamese translation:我会得到一系列重复的越南语翻译：

item物品	vietnamese越南语
fruits水果	[trai cay, trai cay, trai cay, trai cay, trai cay] [trai cay, trai cay, trai cay, trai cay, trai cay]

Is there a way to just keep the last/first value from the vietnamese column?有没有办法只保留vietnamese列中的最后一个/第一个值？

Also, with the query above, the unique_ids value would look like:此外，对于上面的查询， unique_ids值将如下所示：

item物品	unique_ids unique_ids
fruits水果	[abc123, foo99, foo99, abc123, abc123] [abc123, foo99, foo99, abc123, abc123]

Is there a way to aggregate the counts and keep a counter column to achieve?有没有办法聚合计数并保留一个计数器列来实现？

item物品	unique_ids unique_ids
fruits水果	[abc123:3, foo99:2] [abc123:3, foo99:2]

Currently, I've tried just reading the outputs after I get the results from the SQL query by unique-ing with set(vietnamese) and collections.Counter(unique_ids) .目前，我已经尝试通过使用set(vietnamese)和collections.Counter(unique_ids)进行 unique-ing 从 SQL 查询中获取结果后读取输出。 But if it's possible to do that in the SQL query, that'll more desirable.但是，如果可以在 SQL 查询中执行此操作，那就更可取了。

Answer 1

Athena has a many functions that operate on arrays, such as filter , element_at ,cardinality , reduce , as well as functions that create and process maps . Athena 有很多函数在 arrays 上运行，例如filter 、 element_at 、cardinality 、 reduce ，以及创建和处理映射的函数。 You can use these to process the aggregated arrays.您可以使用这些来处理聚合的 arrays。

For example, to count the number of occurrences of each unique ID you can do something like this:例如，要计算每个唯一 ID 的出现次数，您可以这样做：

SELECT
  item,
  transform_values(multimap_agg(unique_id, 1), (k, v) -> cardinality(v))
GROUP BY item

在 aws athena 中聚合项目 sql

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-10-11 18:48:59

在 aws athena 中聚合项目 sql

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-10-11 18:48:59

解决方案1
1 已采纳 2021-10-11 18:48:59