AWS Glue - 从 sql server 表中读取并作为自定义 CSV 文件写入 S3

Question

I am working on Glue since january, and have worked multiple POC, production data lakes using AWS Glue / Databricks / EMR, etc. I have used AWS Glue to read data from S3 and perform ETL before loading to Redshift, Aurora, etc.我从 1 月开始使用 Glue，并且使用 AWS Glue / Databricks / EMR 等处理过多个 POC、生产数据湖。我使用 AWS Glue 从 S3 读取数据并在加载到 Redshift、Aurora 等之前执行 ETL。

I have a need now to read data from a source table which is on SQL SERVER, and fetch data, write to a S3 bucket in a custom (user defined) CSV file, say employee.csv.我现在需要从 SQL SERVER 上的源表中读取数据，并获取数据，写入自定义（用户定义的）CSV 文件中的 S3 存储桶，例如employee.csv。

Am looking for some pointers, to do this please.我正在寻找一些指针，请这样做。

Thanks谢谢

Answer 1

You can connect using JDBC specifying connectionType=sqlserver to get a dynamic frame connecting to SQL SERVER.您可以使用 JDBC 指定connectionType=sqlserver进行connectionType=sqlserver以获取连接到 SQL SERVER 的动态框架。 See here for GlueContext docs有关GlueContext 文档，请参见此处

dynF = glueContext.getSource(connection_type="sqlserver", url = ..., dbtable=..., user=..., password=)

Answer 2

This task fits AWS DMS (Data Migration Service) use case.此任务适合 AWS DMS（数据迁移服务）用例。 DMS is designed to either migrate data from one data storage to another or keep them in sync. DMS 旨在将数据从一个数据存储迁移到另一个数据存储或使它们保持同步。 It can certainly keep in sync as well as transform your source (ie, MSSQL) to your target (ie, S3).它当然可以保持同步并将您的源（即 MSSQL）转换为您的目标（即 S3）。

There is one non-negligible constraint in your case thought.在您的案例中，有一个不可忽视的约束。 Ongoing sync with MSSQL source only works if your license is the Enterprise or Developer Edition and for versions 2016-2019.与 MSSQL 源的持续同步仅适用于您的许可证是企业版或开发者版以及 2016-2019 版。

AWS Glue - 从 sql server 表中读取并作为自定义 CSV 文件写入 S3

问题描述

2 个解决方案

解决方案1
0 2018-09-18 07:15:24

解决方案2
0 2021-03-19 09:47:36

AWS Glue - 从 sql server 表中读取并作为自定义 CSV 文件写入 S3

问题描述

2 个解决方案

解决方案1 0 2018-09-18 07:15:24

解决方案2 0 2021-03-19 09:47:36

解决方案1
0 2018-09-18 07:15:24

解决方案2
0 2021-03-19 09:47:36