简体   繁体   English

Aws Glue - 在 S3 表上合并 SQL

[英]Aws Glue - Merge SQL on S3 table

I am developing my ETL for the DWH pipeline using AWS GLUE.我正在使用 AWS GLUE 为 DWH 管道开发我的 ETL。

I am in the case where my in staging data there are updated rows that need to be merged in my table dimensions.我的情况是,我的暂存数据中有更新的行需要合并到我的表维度中。

Example "User" dimension: In the S3 table " Dim_User " I have the user A with the field "team" equals ' Sales ' .示例“用户”维度:在 S3 表“ Dim_User ”中,用户A的字段“团队”等于“销售 Today my pipeline has read data from the sources and the AWS Glue job wrote in my S3 table "staging_dim_user" that the user A has ' New Sales Dept ' in field "team" .今天,我的管道已从源中读取数据,AWS Glue 作业在我的S3 表“staging_dim_user”中写道,用户 A 在字段“team”中有“ New Sales Dept Using AWS Glue how can I merge the "Dim_user"?使用 AWS Glue 如何合并“Dim_user”? Is it possible to realize my Merge SQL on S3 thought AWS Glue?是否可以通过 AWS Glue 在 S3 上实现我的 Merge SQL? what are the best practices with AWS GLUE and S3 tables in that case?在这种情况下,AWS GLUE 和 S3 表的最佳实践是什么?

You may need to use Athena to merge those data by query您可能需要使用 Athena 通过查询合并这些数据

Athena is unable to merge two sources save in different folders into one table, you might need to use query to merge the data Athena 无法将保存在不同文件夹中的两个源合并到一个表中,您可能需要使用查询来合并数据

SELECT * FROM "database"."table_name"
UNION
SELECT * FROM "database"."table_name

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM