简体   繁体   English

控制Lambda + Kinesis成本

[英]Controlling Lambda + Kinesis Costs

We have a .NET client application that uploads files to S3. 我们有一个.NET客户端应用程序,可以将文件上传到S3。 There is an event notification registered on the bucket which triggers a Lambda to process the file. 在存储桶上注册了一个事件通知,触发Lambda处理该文件。 If we need to do maintenance, then we suspend our processing by removing the event notification and adding it back later when we're ready to resume processing. 如果我们需要进行维护,那么我们会暂停处理,方法是删除事件通知,并在我们准备好恢复处理时将其添加回去。

To process the backlog of files that have queued up in S3 during the period the event notification was disabled, we write a record to a kinesis stream with the S3 key to each file, and we have an event mapping that lets Lambda consume each kinesis record. 为了处理在事件通知被禁用期间在S3中排队的文件的积压,我们使用S3键将记录写入kinesis流到每个文件,并且我们有一个事件映射,允许Lambda使用每个kinesis记录。 This works great for us because it allows us to control our concurrency when we are processing a large backlog by controlling the number of shards in the stream. 这对我们很有用,因为它允许我们通过控制流中的分片数来处理大量积压时控制我们的并发性。 We were originally using SNS but when we had thousands of files that needed to be reprocessed SNS would keep starting Lambdas until we hit our concurrent executions threshold, which is why we switched to Kinesis. 我们最初使用SNS但是当我们有数千个需要重新处理的文件时,SNS将继续启动Lambdas,直到达到我们的并发执行阈值,这就是我们切换到Kinesis的原因。

The problem we're facing right now is that the cost of kinesis is killing us, even though we barely use it. 我们现在面临的问题是,尽管我们几乎没有使用它,但是kinesis的成本正在扼杀我们。 We get 150 - 200 files uploaded per minute, and our lambda takes about 15 seconds to process each one. 我们每分钟上传150到200个文件,而我们的lambda大约需要15秒来处理每个文件。 If we suspend processing for a few hours we end up with thousands of files to process. 如果我们暂停处理几个小时,我们最终会处理数千个文件。 We could easily reprocess them with a 128 shard stream, however that would cost us $1,400 / month. 我们可以使用128分片流轻松地重新处理它们,但这将花费我们1,400美元/月。 The current cost for running our Lambda each month is less than $300. 每个月运行Lambda的当前成本不到300美元。 It seems terrible that we have to increase our COGS by 400% just to be able to control our concurrency level during a recovery scenario. 为了能够在恢复方案中控制我们的并发级别,我们必须将COGS增加400%似乎很糟糕。

I could attempt to keep the stream size small by default and then resize it on the fly before we re-process a large backlog, however resizing a stream from 1 shard up to 128 takes an incredibly long time. 我可以尝试在默认情况下保持流大小较小,然后在重新处理大量积压之前动态调整大小,但是将流从1个分片调整为128会花费相当长的时间。 If we're trying to recover from an unplanned outage then we can't afford to sit around waiting for the stream to resize before we can use it. 如果我们试图从意外中断中恢复过来,那么在我们使用它之前,我们就无法忍受等待流调整大小。 So my questions are: 所以我的问题是:

  1. Can anyone recommend an alternative pattern to using kinesis shards for being able to control the upper bound on the number of concurrent lambdas draining a queue? 任何人都可以推荐使用kinesis分片的替代模式,以便能够控制排空队列的并发lambd数量的上限吗?

  2. Is there something I am missing which would allow us to use Kinesis more cost efficiently? 是否有我遗漏的东西可以让我们更经济地使用Kinesis?

You can use SQS with Lambda or Worker EC2s. 您可以将SQS与Lambda或Worker EC2一起使用。

Here is how it can be achieved (2 approaches): 以下是如何实现(2种方法):

1. Serverless Approach 1.无服务器方法

  • S3 -> SNS -> SQS -> Lambda Sceduler -> Lambda S3 - > SNS - > SQS - > Lambda Sceduler - > Lambda

  • Use SQS instead of Kinesis for storing S3 Paths 使用SQS而不是Kinesis存储S3路径

  • Use a Lambda Scheduler to keep polling messages (S3 paths) from SQS 使用Lambda Scheduler从SQS继续轮询消息(S3路径)

  • Invoke Lambda function from Lambda scheduler for processing files 从Lambda调度程序调用Lambda函数来处理文件

2. EC2 Approach 2. EC2方法

  • S3 -> SNS -> SQS -> Beanstalk Worker S3 - > SNS - > SQS - > Beanstalk Worker

  • Use SQS instead of Kinesis for storing S3 Paths 使用SQS而不是Kinesis存储S3路径

  • Use Beanstalk Worker environment which polls SQS automatically 使用Beanstalk Worker环境自动轮询SQS

  • Implement the application (processing logic) in the Beanstalk worker hosted locally on a HTTP server in the same EC2 在同一EC2中的HTTP服务器上本地托管的Beanstalk工作程序中实现应用程序(处理逻辑)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM