简体   繁体   English

数据管道(DynamoDB 到 S3) - 如何格式化 S3 文件?

[英]Data Pipeline (DynamoDB to S3) - How to format S3 file?

I have a Data Pipeline that exports my DynamoDB table to an S3 bucket so I can use the S3 file for services like QuickSight, Athena and Forecast.我有一个将我的 DynamoDB 表导出到 S3 存储桶的数据管道,因此我可以将 S3 文件用于 QuickSight、Athena 和 Forecast 等服务。

However, for my S3 file to work with these services, I need the file to be formatted in a csv like so:但是,为了让我的 S3 文件与这些服务一起使用,我需要将文件格式化为 csv,如下所示:

date, journal, id
1589529457410, PLoS Genetics, 10.1371/journal.pgen.0030110
1589529457410, PLoS Genetics, 10.1371/journal.pgen.1000047

But instead, my exported file looks like this:但相反,我导出的文件如下所示:

{"date":{"s":"1589529457410"},"journal":{"s":"PLoS Genetics"},"id":{"s":"10.1371/journal.pgen.0030110"}}
{"date":{"s":"1589833552714"},"journal":{"s":"PLoS Genetics"},"id":{"s":"10.1371/journal.pgen.1000047"}}

How can I specify the format for my exported file in S3 so I can operate with services like QuickSight, Athena and Forecast?如何在 S3 中指定导出文件的格式,以便使用 QuickSight、Athena 和 Forecast 等服务进行操作? I'd preferably do the data transformation using Data Pipeline as well.我最好也使用 Data Pipeline 进行数据转换。

Athena can readJSON data . Athena 可以读取JSON 数据

You can also use DynamoDB streams to stream the data to S3.您还可以使用DynamoDB 流到stream 的数据到 S3。 Here is a link to a blog post with best practice and design patterns for streaming data from DynamoDB to S3 to be used with Athena .这是博客文章的链接,其中包含将数据从 DynamoDB 流式传输到S3 以与 Athena 一起使用的最佳实践和设计模式。

You can use DynamoDB streams to trigger an AWS Lambda function, which can transform the data and store it in Amazon S3 , Amazon Redshift etc. With AWS Lambda you could also trigger Amazon Forecast to retrain, or pass the data to Amazon Forecast for a prediction.您可以使用 DynamoDB 流触发AWS Lambda function,它可以转换数据并将其存储在Amazon S3Amazon Redshift等中。使用 AWS Lambda,您还可以重新训练 Amazon 以触发Amazon .

Alternatively you could use Amazon Data Pipeline to write the data to an S3 bucket as you currently have it.或者,您可以使用Amazon Data Pipeline将数据写入您当前拥有的 S3 存储桶。 Then use a cloud watch event scheduled to run a lambda function, or an S3 event notification to run a lambda function. Then use a cloud watch event scheduled to run a lambda function, or an S3 event notification to run a lambda function. The lambda function can transform the file and store it in another S3 bucket for further processing. lambda function 可以转换文件并将其存储在另一个 S3 存储桶中以供进一步处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM