简体   繁体   English

AWS Glue - 从 sql server 表中读取并作为自定义 CSV 文件写入 S3

[英]AWS Glue - read from a sql server table and write to S3 as a custom CSV file

I am working on Glue since january, and have worked multiple POC, production data lakes using AWS Glue / Databricks / EMR, etc. I have used AWS Glue to read data from S3 and perform ETL before loading to Redshift, Aurora, etc.我从 1 月开始使用 Glue,并且使用 AWS Glue / Databricks / EMR 等处理过多个 POC、生产数据湖。我使用 AWS Glue 从 S3 读取数据并在加载到 Redshift、Aurora 等之前执行 ETL。

I have a need now to read data from a source table which is on SQL SERVER, and fetch data, write to a S3 bucket in a custom (user defined) CSV file, say employee.csv.我现在需要从 SQL SERVER 上的源表中读取数据,并获取数据,写入自定义(用户定义的)CSV 文件中的 S3 存储桶,例如employee.csv。

Am looking for some pointers, to do this please.我正在寻找一些指针,请这样做。

Thanks谢谢

You can connect using JDBC specifying connectionType=sqlserver to get a dynamic frame connecting to SQL SERVER.您可以使用 JDBC 指定connectionType=sqlserver进行connectionType=sqlserver以获取连接到 SQL SERVER 的动态框架。 See here for GlueContext docs有关GlueContext 文档,请参见此处

dynF = glueContext.getSource(connection_type="sqlserver", url = ..., dbtable=..., user=..., password=)

This task fits AWS DMS (Data Migration Service) use case.此任务适合 AWS DMS(数据迁移服务)用例。 DMS is designed to either migrate data from one data storage to another or keep them in sync. DMS 旨在将数据从一个数据存储迁移到另一个数据存储或使它们保持同步。 It can certainly keep in sync as well as transform your source (ie, MSSQL) to your target (ie, S3).它当然可以保持同步并将您的源(即 MSSQL)转换为您的目标(即 S3)。

There is one non-negligible constraint in your case thought.在您的案例中,有一个不可忽视的约束。 Ongoing sync with MSSQL source only works if your license is the Enterprise or Developer Edition and for versions 2016-2019.与 MSSQL 源的持续同步仅适用于您的许可证是企业版或开发者版以及 2016-2019 版。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 AWS GLUE 对 S3 CSV 文件进行排序 - How to sort S3 CSV File using AWS GLUE 无法从 aws 胶水写入 s3(属性错误) - Cannot write to s3 from aws glue (attribute error) 如何从s3读取CSV文件并使用python lambda函数将内容写入RDS数据库表中? - How to read a CSV file from s3 and write the content in RDS database table using python lambda function? 如何从 AWS Lambda 中的 s3 存储桶读取 csv 文件? - How to read csv file from s3 bucket in AWS Lambda? 使用 boto 和 pandas 从 aws s3 读取 csv 文件 - Read a csv file from aws s3 using boto and pandas AWS Lambda:使用Python从s3存储桶中读取csv文件尺寸,而无需使用Pandas或CSV包 - AWS Lambda: read csv file dimensions from an s3 bucket with Python without using Pandas or CSV package 如何使用 Glue 作业将 JSON 从 s3 转换为 CSV 文件并将其保存在同一个 s3 存储桶中 - How to convert JSON to CSV file from s3 and save it in same s3 bucket using Glue job 使用 AWS Lambda (python) 编写 csv 文件并将其保存到 S3 中 - Write csv file and save it into S3 using AWS Lambda (python) 如何在显示来自AWS S3的分区表的计数和架构时修复AWS Glue代码 - How to fix AWS Glue code in displaying count and schema of partitioned table from AWS S3 Pyspark:从AWS:S3 bucket中读取数据并写入postgres表 - Pyspark: Read data from AWS:S3 bucket and write to postgres table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM