简体   繁体   中英

AWS ATHENA Transpose Columns into Rows

I have a csv file uploaded to an S3 bucket which I pick up with AWS Glue then query using Athena. The CSV table is in the format below:

Item Country Category 2017 2018 2019 2020
Item1 CA Network 128 129 130 129
Item2 CA Desktop 128 129 130 129
Item3 CA Apps 128 129 130 129

I want to convert that format into:

Item Country Category Year Value
Item1 CA Network 2017 128
Item1 CA Network 2018 129
Item1 CA Network 2019 130
Item1 CA Network 2020 129
Item2 CA Desktop 2017 128
Item2 CA Desktop 2018 129
Item2 CA Desktop 2019 130
Item2 CA Desktop 2020 129
Item3 CA Apps 2017 128
Item3 CA Apps 2018 129
Item3 CA Apps 2019 130
Item3 CA Apps 2020 129

How do I accomplish that using SQL in Athena?

I tried this but it doesn't work for me: Simple way to transpose columns and rows in SQL?

Any help is appreciated. Thanks!

Union all provides one option here:

SELECT Item, Country, Category, 2017 AS Year, "2017" AS Value FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2018, "2018" FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2019, "2019" FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2020, "2020" FROM yourTable
ORDER BY Item, Country, Category, Year, Value;

This approach is not robust to having a dynamic number of year columns. But then again, you should not be going with that design anyway, since it is not normalized. So, hopefully you can use the above query, or a slight variant of it, to get your data normalized as it appears in the expected output.

You can do this with one scan by creating an array and then unnesting the array:

select t.item, t.country, t.category, r.year, r.value
from t cross join
     unnest( array[ cast(row(2017, t."2017") as row(year int, value int)),
                    cast(row(2018, t."2018") as row(year int, value int)),
                    cast(row(2019, t."2019") as row(year int, value int)),
                    cast(row(2020, t."2020") as row(year int, value int))
                  ]
           ) u(r);

If your table is really a view or complex query, the performance gain can be significant.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM