简体   繁体   English

Amazon Kinesis Firehose 缓冲到 S3

[英]Amazon Kinesis Firehose Buffering to S3

I'm attempting to price out a streaming data / analytic application deployed to AWS and looking at using Kinesis Firehose to dump the data into S3.我正在尝试为部署到 AWS 的流数据/分析应用程序定价,并考虑使用 Kinesis Firehose 将数据转储到 S3。

My question is, when pricing out the S3 costs for this, I need to figure out out how many PUT's I will need.我的问题是,当为此定价 S3 成本时,我需要弄清楚我需要多少 PUT。

So, I know the Firehose buffers the data and then flushes out to S3, however I'm unclear on whether it will write a single "file" with all of the records accumulated up to that point or if it will write each record individually.所以,我知道 Firehose 会缓冲数据,然后刷新到 S3,但是我不清楚它是否会写入一个包含所有记录的“文件”,或者它是否会单独写入每条记录。

So, assuming I set the buffer size / interval to an optimal amount based on size of records, does the number of S3 PUT's still equal the number of records OR the number of flushes that the Firehose performs?因此,假设我根据记录大小将缓冲区大小/间隔设置为最佳数量,那么 S3 PUT 的数量是否仍然等于记录数量或 Firehose 执行的刷新数量?

Having read a substantial amount of AWS documentation, I respectfully disagree with the assertion that S3 will not charge you.在阅读了大量 AWS 文档后,我不同意 S3 不会向您收费的说法。

You will be billed separately for charges associated with Amazon S3 and Amazon Redshift usage including storage and read/write requests .您需要单独支付与 Amazon S3和 Amazon Redshift 使用相关的费用,包括存储和读/写请求 However, you will not be billed for data transfer charges for the data that Amazon Kinesis Firehose loads into Amazon S3 and Amazon Redshift.但是,您无需为 Amazon Kinesis Firehose 加载到 Amazon S3 和 Amazon Redshift 的数据支付数据传输费用。 For further details, see Amazon S3 pricing and Amazon Redshift pricing.有关更多详细信息,请参阅 Amazon S3 定价和 Amazon Redshift 定价。 [emphasis mine] [强调我的]

https://aws.amazon.com/kinesis/firehose/pricing/ https://aws.amazon.com/kinesis/firehose/pricing/

What they are saying you will not be charged is anything additional by Kinesis Firehose for the transfers, other than the $0.035/GB, but you'll pay for the interactions with your bucket.他们说的是,除了0.035 美元/GB之外,Kinesis Firehose不会向您收取任何额外的传输费用,但您需要为与存储桶的交互付费。 (Data inbound to a bucket is always free of actual per-gigabyte transfer charges). (进入存储桶的数据始终不收取实际的每 GB 传输费用)。

In the final analysis, though, you appear to be in control of the rough number of PUT requests against your bucket, based on some tunable parameters:不过,归根结底,您似乎可以根据一些可调参数控制针对您的存储桶的PUT请求的粗略数量:

Q: What is buffer size and buffer interval?问:什么是缓冲区大小和缓冲区间隔?

Amazon Kinesis Firehose buffers incoming streaming data to a certain size or for a certain period of time before delivering it to destinations. Amazon Kinesis Firehose 将传入的流数据缓冲到特定大小或一段时间,然后再将其传输到目标。 You can configure buffer size and buffer interval while creating your delivery stream.您可以在创建传输流时配置缓冲区大小和缓冲区间隔。 Buffer size is in MBs and ranges from 1MB to 128MB.缓冲区大小以 MB 为单位,范围从 1MB 到 128MB。 Buffer interval is in seconds and ranges from 60 seconds to 900 seconds.缓冲间隔以秒为单位,范围从 60 秒到 900 秒。

https://aws.amazon.com/kinesis/firehose/faqs/#creating-delivery-streams https://aws.amazon.com/kinesis/firehose/faqs/#creating-delivery-streams

Unless it is collecting and aggregating the records into large files, I don't see why there would be a point in the buffer size and buffer interval... however, without firing up the service and taking it for a spin, I can (unfortunately) only really speculate.除非它正在收集记录并将其聚合到大文件中,否则我不明白为什么缓冲区大小和缓冲区间隔会有一个点......但是,如果不启动服务并进行旋转,我可以(不幸的是)只是真的推测。

I don't believe you pay anything extra for the write operation to S3 from Firehose.我不相信您为从 Firehose 到 S3 的写操作支付任何额外费用。

You will be billed separately for charges associated with Amazon S3 and Amazon Redshift usage including storage and read/write requests.您需要单独支付与 Amazon S3 和 Amazon Redshift 使用相关的费用,包括存储和读/写请求。 However, you will not be billed for data transfer charges for the data that Amazon Kinesis Firehose loads into Amazon S3 and Amazon Redshift.但是,您无需为 Amazon Kinesis Firehose 加载到 Amazon S3和 Amazon Redshift的数据支付数据传输费用 For further details, see Amazon S3 pricing and Amazon Redshift pricing.有关更多详细信息,请参阅 Amazon S3 定价和 Amazon Redshift 定价。

https://aws.amazon.com/kinesis/firehose/pricing/ https://aws.amazon.com/kinesis/firehose/pricing/

the cost is one S3 PUT for any operation done by kinesis, not for a single object.对于 kinesis 完成的任何操作,成本是一个 S3 PUT,而不是单个对象。 so one flush of firehose is one put:所以一冲水管就是一放:

https://docs.aws.amazon.com/whitepapers/latest/building-data-lakes/data-ingestion-methods.html https://docs.aws.amazon.com/whitepapers/latest/building-data-lakes/data-ingestion-methods.html

https://forums.aws.amazon.com/thread.jspa?threadID=219275&tstart=0 https://forums.aws.amazon.com/thread.jspa?threadID=219275&tstart=0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM