AWS ATHENA 将列转置为行

Question

I have a csv file uploaded to an S3 bucket which I pick up with AWS Glue then query using Athena.我有一个 csv 文件上传到一个 S3 存储桶，我使用 AWS Glue 获取该存储桶，然后使用 Athena 进行查询。 The CSV table is in the format below: CSV 表格式如下：

Item物品	Country国家	Category类别	2017 2017	2018 2018	2019 2019	2020 2020
Item1项目1	CA加州	Network网络	128 128	129 129	130 130	129 129
Item2项目2	CA加州	Desktop桌面	128 128	129 129	130 130	129 129
Item3第 3 项	CA加州	Apps应用	128 128	129 129	130 130	129 129

I want to convert that format into:我想将该格式转换为：

Item物品	Country国家	Category类别	Year年	Value价值
Item1项目1	CA加州	Network网络	2017 2017	128 128
Item1项目1	CA加州	Network网络	2018 2018	129 129
Item1项目1	CA加州	Network网络	2019 2019	130 130
Item1项目1	CA加州	Network网络	2020 2020	129 129
Item2项目2	CA加州	Desktop桌面	2017 2017	128 128
Item2项目2	CA加州	Desktop桌面	2018 2018	129 129
Item2项目2	CA加州	Desktop桌面	2019 2019	130 130
Item2项目2	CA加州	Desktop桌面	2020 2020	129 129
Item3第 3 项	CA加州	Apps应用	2017 2017	128 128
Item3第 3 项	CA加州	Apps应用	2018 2018	129 129
Item3第 3 项	CA加州	Apps应用	2019 2019	130 130
Item3第 3 项	CA加州	Apps应用	2020 2020	129 129

How do I accomplish that using SQL in Athena?如何在 Athena 中使用 SQL 来实现这一点？

I tried this but it doesn't work for me: Simple way to transpose columns and rows in SQL?我试过了，但对我不起作用： Simple way to transpose columns and rows in SQL?

Any help is appreciated.任何帮助表示赞赏。 Thanks!谢谢！

Answer 1

Union all provides one option here: Union all 在这里提供了一种选择：

SELECT Item, Country, Category, 2017 AS Year, "2017" AS Value FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2018, "2018" FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2019, "2019" FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2020, "2020" FROM yourTable
ORDER BY Item, Country, Category, Year, Value;

This approach is not robust to having a dynamic number of year columns.这种方法对于具有动态数量的年份列并不稳健。 But then again, you should not be going with that design anyway, since it is not normalized.但是话又说回来，无论如何你都不应该使用那个设计，因为它没有标准化。 So, hopefully you can use the above query, or a slight variant of it, to get your data normalized as it appears in the expected output.所以，希望你可以使用上面的查询，或者它的一个小变种，让你的数据标准化，因为它出现在预期的 output 中。

Answer 2

You can do this with one scan by creating an array and then unnesting the array:您可以通过创建一个数组然后取消嵌套数组来进行一次扫描：

select t.item, t.country, t.category, r.year, r.value
from t cross join
     unnest( array[ cast(row(2017, t."2017") as row(year int, value int)),
                    cast(row(2018, t."2018") as row(year int, value int)),
                    cast(row(2019, t."2019") as row(year int, value int)),
                    cast(row(2020, t."2020") as row(year int, value int))
                  ]
           ) u(r);

If your table is really a view or complex query, the performance gain can be significant.如果您的表确实是一个视图或复杂查询，则性能提升可能非常显着。

AWS ATHENA 将列转置为行

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-02-02 08:54:22

解决方案2
1 2021-02-02 12:51:11

AWS ATHENA 将列转置为行

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-02-02 08:54:22

解决方案2 1 2021-02-02 12:51:11

解决方案1
1 已采纳 2021-02-02 08:54:22

解决方案2
1 2021-02-02 12:51:11