简体   繁体   English

更新 AWS Athena 数据和表以重命名列

[英]Update AWS Athena data & table to rename columns

Today, I saw myself with a simple problem, renaming column of an Athena glue table from old to new name.今天,我看到自己遇到了一个简单的问题,将 Athena 粘合表的列从名称重命名为名称。

First thing, I search here and tried some solutions like this , this , and many others.首先,我在这里搜索并尝试了一些解决方案,例如thisthis和许多其他解决方案。 Unfortunately, none works, so I decided to use my knowledge and imagination.不幸的是,没有任何效果,所以我决定利用我的知识和想象力。

I'm posting this question with the intention of share, but also, with the intention to get how others did and maybe find out I reinvented the wheel .我发布这个问题的目的是为了分享,但也是为了了解其他人是如何做的,也许会发现我重新发明了轮子 So please also share your way if you know how to do it.所以如果你知道怎么做,也请分享你的方法。

My setup is, a Athena JSON table partitioned by day with valuable and enormous amount of data, the infrastructure is defined and updated through Cloudformation.我的设置是,一个按天分区的 Athena JSON 表,其中包含有价值的大量数据,基础设施是通过 Cloudformation 定义和更新的。

How to rename an Athena column and still keep the data?如何重命名 Athena 列并保留数据?

Explaining without all the cloudformation infrastructure.在没有所有 cloudformation 基础设施的情况下进行解释。

Imagine a table containing:想象一个包含以下内容的表:

  • userId用户身份
  • score分数
  • otherColumns其他列
  • eventDateUtc事件日期
  • dt_utc dt_utc

Partitioned by dt_utc and stored using JSON format.由 dt_utc 分区并使用 JSON 格式存储。 Wee need to change the column score to deltaScore.我们需要将列分数更改为 deltaScore。

Keep in mind, although I haven't tested with others format/configurations, this should apply to any configuration supported by athena as we are going to use athena algorithm to do the job for us.请记住,虽然我没有使用其他格式/配置进行测试,但这应该适用于 athena 支持的任何配置,因为我们将使用 athena 算法为我们完成这项工作。

How to do怎么做

if you run the cloudformation migration first, you gonna "lose" access to the dropped column.如果您先运行 cloudformation 迁移,您将“失去”对已删除列的访问权限。
but you can simply rename the column back and the data appears.但您可以简单地重新命名该列并显示数据。

Those are the steps required for rename a AWS Athena table:这些是重命名 AWS Athena 表所需的步骤:

  1. Create a temporary table mapping the old column name to the new one:创建一个将旧列名映射到新列名的临时表:
    This can be done by use of CREATE TABLE AS , read more in the aws docs这可以通过使用CREATE TABLE AS来完成, 在 aws 文档中阅读更多
    With this command, we use Athena engine to apply the transformation on the files of the original table for us and save at s3://bucket_name/A_folder/temp_table_rename/ .通过这个命令,我们使用 Athena 引擎为我们对原始表的文件应用转换,并保存在s3://bucket_name/A_folder/temp_table_rename/
CREATE TABLE "temp_table_rename"
WITH(
  format = 'JSON',
  external_location = 's3://bucket_name/A_folder/temp_table_rename/',
  partitioned_by = ARRAY['dt_utc']
)
AS
 SELECT DISTINCT
   userid,
   score as deltascore,
   otherColumns,
   eventDateUtc,
   "dt_utc"
   FROM "my_database"."original_table"
  1. Apply the database rename by running the cloudformation with the changes or on the way you have.通过使用更改或按照您拥有的方式运行 cloudformation 来应用数据库重命名。
    At this point, you can even drop the original_table, and create again using the right column name.此时,您甚至可以删除 original_table,并使用正确的列名再次创建。
    After rename, you will notice that the renamed column have no data.重命名后,您会注意到重命名的列没有数据。

  2. Remove the data of the original table by deleting it's s3 source.通过删除原始表的 s3 源来删除原始表的数据。

  3. Copy the data from the temp table source to the original table source将临时表源中的数据复制到原始表源
    I prefer to use a aws command as, there can be thousands of files to copy我更喜欢使用 aws 命令,因为可以复制数千个文件

aws s3 cp s3://bucket_name/A_folder/temp_table_rename/ s3://bucket_name/A_folder/original_table/ --recursive

  1. Restore the index of the original table MSCK REPAIR TABLE "my_database"."original_table"恢复原表的索引MSCK REPAIR TABLE "my_database"."original_table"

done.完毕。

Final notes:最后说明:

Using CREATE TABLE AS to do the transformation job, allow you to do much more than only renaming the column, for example split the data of a column into 2 new columns, or merge it to a single one.使用CREATE TABLE AS进行转换工作,您可以做的不仅仅是重命名列,例如将列的数据拆分为 2 个新列,或将其合并为一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM