简体   繁体   English

AWS Kinesis Data Firehose 和 Lambda

[英]AWS Kinesis Data Firehose and Lambda

I have different data sources and I need to publish them to S3 in real-time.我有不同的数据源,我需要将它们实时发布到 S3。 I also need to process and validate data before delivering them to S3 buckets.在将数据传送到 S3 存储桶之前,我还需要处理和验证数据。 So, I have to use AWS Lambda and validating data.所以,我必须使用 AWS Lambda 和验证数据。 The question is that what is the difference between AWS Kinesis Data Firehose and using AWS Lambda to directly store data into S3 Bucket?问题是AWS Kinesis Data Firehose和使用AWS Lambda直接存入S3 Bucket有什么区别? Clearly, what is the advantages of using Kinesis Data Firehose?很明显,使用 Kinesis Data Firehose 的优势是什么? because we can use AWS Lambda to directly put records into S3!因为我们可以使用 AWS Lambda 直接将记录放入 S3!

We might want to clarify near real time, as for me, it is below 1 sec.我们可能想要近乎实时地澄清,对我来说,它低于 1 秒。

Kinesis Firehose in this case will batch the items before delivering them into S3.在这种情况下,Kinesis Firehose 将在将项目交付到 S3 之前对其进行批处理。 This will result in more items per S3 object. You can configured how often you want the data to be stored.这将导致每个 S3 object 有更多项目。您可以配置希望数据存储的频率。 (You can also connect a lambda to firehose, so you can process the data before delivering them to S3). (您还可以将 lambda 连接到 firehose,这样您就可以在将数据传送到 S3 之前对其进行处理)。 Kinesis Firehose will scale automatically. Kinesis Firehose 将自动扩展。

Note that each PUT to S3 as a cost associated to it.请注意,每个 PUT 到 S3 都是与其关联的成本。

If you connect your data source to AWS Lambda, then each event will trigger the lambda (unless you have a batching mechanism in place, which you didn't mention) and for each event, you will make a PUT request to S3.如果您将数据源连接到 AWS Lambda,那么每个事件都会触发 lambda(除非您有适当的批处理机制,但您没有提到),并且对于每个事件,您都会向 S3 发出 PUT 请求。 This will result in a lot of small object in S3 and therefore a lot of S3 PUT api. Also, depending on the number of items received per seconds, Lambda might not be able to scale and cost associated will increase.这将导致 S3 中有很多小的 object,因此会有很多 S3 PUT api。此外,根据每秒收到的项目数量,Lambda 可能无法扩展,相关成本也会增加。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Amazon Kinesis 和 AWS Lambda 重试 - Amazon Kinesis & AWS Lambda Retries 如何在不通过 Kinesis Data 的情况下直接将 KPL(Kinesis Producer Library)集成到 Kinesis firehose Stream - How to integrate KPL (Kinesis Producer Library) to Kinesis firehose directly without going through Kinesis Data Stream 使用来自 Kinesis Data Stream 源的 Kinesis Firehose Delivery Stream 将数据写入 S3 时出现问题 - Problem writing data to S3 with Kinesis Firehose Delivery Stream from Kinesis Data Stream source 读取 Amazon Kinesis Firehose 写入 s3 的数据 stream - Reading the data written to s3 by Amazon Kinesis Firehose stream AWS Kinesis,并发 Lambda 处理,保证顺序 - AWS Kinesis, concurrent Lambda processing with a guaranteed ordering 改变 aws lambda 运动的消费者批量大小是否会导致数据丢失或重复 - Does changing aws lambda consumer batch size of kinesis cause data loss or duplication 如何对从 AWS Kinesis Firehose 到 Redshift 的记录进行重复数据删除? - How to do de-duplication on records from AWS Kinesis Firehose to Redshift? 如何从 AWS Kinesis Firehose 编写带有 int64 时间戳(而不是 int96)的 Parquet 文件? - How can I write Parquet files with int64 timestamps (instead of int96) from AWS Kinesis Firehose? AWS Greengrass 不向 AWS Kinesis 发送数据 - AWS Greengrass doesn't send data to AWS Kinesis 按事件时间对 Kinesis firehose S3 记录进行分区 - Partition Kinesis firehose S3 records by event time
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM