简体   繁体   English

AWS Glue中的ETL作业-是否可能覆盖数据?

[英]ETL jobs in AWS Glue — possible to overwrite data?

I am trying to write an AWS Glue ETL job that updates schema based on the most recent schema version. 我正在尝试编写一个AWS Glue ETL作业,该作业将根据最新架构版本来更新架构。

I know this is not typically desirable behavior, but to minimize the number of output files, is it possible to do the transformations directly on the source data so that the transformed data is then loaded back to the same path? 我知道这通常不是理想的行为,但是为了最大程度地减少输出文件的数量,是否可以直接在源数据上进行转换,然后将转换后的数据加载回同一路径?

Or is it possible to delete the data in the source path to then rewrite it to the same destination? 还是可以删除源路径中的数据,然后将其重写到相同的目标位置?

You do not need to ETL job to edit the schema unless if you want to automate the process. 除非要自动执行该过程,否则不需要ETL作业即可编辑模式。 You can use edit schema feature of data catalog generated by AWS Glue crawler. 您可以使用由AWS Glue搜寻器生成的数据目录的编辑架构功能。

  • Navigate to tables of AWS Glue 导航到AWS Glue的表
  • Choose the table you want to change schema 选择要更改架构的表
  • you can find edit schema button inside the table 您可以在表格内找到编辑架构按钮

After editing the schema, you can see the versions of the table auto maintained by the AWS Glue 编辑架构后,您可以看到由AWS Glue自动维护的表的版本

Furthermore, you can also compare the versions of the table 此外,您还可以比较表格的版本

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM