[英]AWS ATHENA Transpose Columns into Rows
I have a csv file uploaded to an S3 bucket which I pick up with AWS Glue then query using Athena.我有一个 csv 文件上传到一个 S3 存储桶,我使用 AWS Glue 获取该存储桶,然后使用 Athena 进行查询。 The CSV table is in the format below:
CSV 表格式如下:
Item![]() |
Country![]() |
Category![]() |
2017 ![]() |
2018 ![]() |
2019 ![]() |
2020 ![]() |
---|---|---|---|---|---|---|
Item1![]() |
CA![]() |
Network![]() |
128 ![]() |
129 ![]() |
130 ![]() |
129 ![]() |
Item2![]() |
CA![]() |
Desktop![]() |
128 ![]() |
129 ![]() |
130 ![]() |
129 ![]() |
Item3![]() |
CA![]() |
Apps![]() |
128 ![]() |
129 ![]() |
130 ![]() |
129 ![]() |
I want to convert that format into:我想将该格式转换为:
Item![]() |
Country![]() |
Category![]() |
Year![]() |
Value![]() |
---|---|---|---|---|
Item1![]() |
CA![]() |
Network![]() |
2017 ![]() |
128 ![]() |
Item1![]() |
CA![]() |
Network![]() |
2018 ![]() |
129 ![]() |
Item1![]() |
CA![]() |
Network![]() |
2019 ![]() |
130 ![]() |
Item1![]() |
CA![]() |
Network![]() |
2020 ![]() |
129 ![]() |
Item2![]() |
CA![]() |
Desktop![]() |
2017 ![]() |
128 ![]() |
Item2![]() |
CA![]() |
Desktop![]() |
2018 ![]() |
129 ![]() |
Item2![]() |
CA![]() |
Desktop![]() |
2019 ![]() |
130 ![]() |
Item2![]() |
CA![]() |
Desktop![]() |
2020 ![]() |
129 ![]() |
Item3![]() |
CA![]() |
Apps![]() |
2017 ![]() |
128 ![]() |
Item3![]() |
CA![]() |
Apps![]() |
2018 ![]() |
129 ![]() |
Item3![]() |
CA![]() |
Apps![]() |
2019 ![]() |
130 ![]() |
Item3![]() |
CA![]() |
Apps![]() |
2020 ![]() |
129 ![]() |
How do I accomplish that using SQL in Athena?如何在 Athena 中使用 SQL 来实现这一点?
I tried this but it doesn't work for me: Simple way to transpose columns and rows in SQL?我试过了,但对我不起作用: Simple way to transpose columns and rows in SQL?
Any help is appreciated.任何帮助表示赞赏。 Thanks!
谢谢!
Union all provides one option here: Union all 在这里提供了一种选择:
SELECT Item, Country, Category, 2017 AS Year, "2017" AS Value FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2018, "2018" FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2019, "2019" FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2020, "2020" FROM yourTable
ORDER BY Item, Country, Category, Year, Value;
This approach is not robust to having a dynamic number of year columns.这种方法对于具有动态数量的年份列并不稳健。 But then again, you should not be going with that design anyway, since it is not normalized.
但是话又说回来,无论如何你都不应该使用那个设计,因为它没有标准化。 So, hopefully you can use the above query, or a slight variant of it, to get your data normalized as it appears in the expected output.
所以,希望你可以使用上面的查询,或者它的一个小变种,让你的数据标准化,因为它出现在预期的 output 中。
You can do this with one scan by creating an array and then unnesting the array:您可以通过创建一个数组然后取消嵌套数组来进行一次扫描:
select t.item, t.country, t.category, r.year, r.value
from t cross join
unnest( array[ cast(row(2017, t."2017") as row(year int, value int)),
cast(row(2018, t."2018") as row(year int, value int)),
cast(row(2019, t."2019") as row(year int, value int)),
cast(row(2020, t."2020") as row(year int, value int))
]
) u(r);
If your table is really a view or complex query, the performance gain can be significant.如果您的表确实是一个视图或复杂查询,则性能提升可能非常显着。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.