I have a csv file uploaded to an S3 bucket which I pick up with AWS Glue then query using Athena. The CSV table is in the format below:
Item | Country | Category | 2017 | 2018 | 2019 | 2020 |
---|---|---|---|---|---|---|
Item1 | CA | Network | 128 | 129 | 130 | 129 |
Item2 | CA | Desktop | 128 | 129 | 130 | 129 |
Item3 | CA | Apps | 128 | 129 | 130 | 129 |
I want to convert that format into:
Item | Country | Category | Year | Value |
---|---|---|---|---|
Item1 | CA | Network | 2017 | 128 |
Item1 | CA | Network | 2018 | 129 |
Item1 | CA | Network | 2019 | 130 |
Item1 | CA | Network | 2020 | 129 |
Item2 | CA | Desktop | 2017 | 128 |
Item2 | CA | Desktop | 2018 | 129 |
Item2 | CA | Desktop | 2019 | 130 |
Item2 | CA | Desktop | 2020 | 129 |
Item3 | CA | Apps | 2017 | 128 |
Item3 | CA | Apps | 2018 | 129 |
Item3 | CA | Apps | 2019 | 130 |
Item3 | CA | Apps | 2020 | 129 |
How do I accomplish that using SQL in Athena?
I tried this but it doesn't work for me: Simple way to transpose columns and rows in SQL?
Any help is appreciated. Thanks!
Union all provides one option here:
SELECT Item, Country, Category, 2017 AS Year, "2017" AS Value FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2018, "2018" FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2019, "2019" FROM yourTable
UNION ALL
SELECT Item, Country, Category, 2020, "2020" FROM yourTable
ORDER BY Item, Country, Category, Year, Value;
This approach is not robust to having a dynamic number of year columns. But then again, you should not be going with that design anyway, since it is not normalized. So, hopefully you can use the above query, or a slight variant of it, to get your data normalized as it appears in the expected output.
You can do this with one scan by creating an array and then unnesting the array:
select t.item, t.country, t.category, r.year, r.value
from t cross join
unnest( array[ cast(row(2017, t."2017") as row(year int, value int)),
cast(row(2018, t."2018") as row(year int, value int)),
cast(row(2019, t."2019") as row(year int, value int)),
cast(row(2020, t."2020") as row(year int, value int))
]
) u(r);
If your table is really a view or complex query, the performance gain can be significant.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.