简体   繁体   English

AWS胶水ETL作业在批次的S3事件上触发

[英]AWS Glue ETL Job triggered on batches of S3 Events

I have an S3 bucket that gets many files dropped in it (1000 records/min). 我有一个S3存储桶,其中包含许多文件(1000条记录/分钟)。 I want to trigger a Glue ETL job on batches of these dropped files. 我想在批量删除的文件上触发Glue ETL作业。

I have looked at using Firehose to aggregate the batches of the events, but that requires a lot of chained resources. 我已经看过使用Firehose来聚合事件的批次,但这需要大量的链接资源。 Like S3 -> Lambda -> Firehose -> ... 喜欢S3 - > Lambda - > Firehose - > ......

What is the best way to process my data in batches? 批量处理数据的最佳方法是什么?

You can use AWS Glue Job Triggers which will allow you to run the glue job at scheduled intervals, rather than as an S3 event trigger? 您可以使用AWS Glue Job Triggers,它允许您以预定的时间间隔运行粘合作业,而不是作为S3事件触发器运行?

Are you processing streaming data? 你在处理流数据吗? Don't see a use case / purpose for Firehose, with your limited information. 在您的信息有限的情况下,请勿查看Firehose的用例/用途。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM