简体   繁体   English

Kinesis Analytics目标指南:Lambda与Kinesis Stream到Lambda

[英]Kinesis Analytics Destination Guidance: Lambda vs Kinesis Stream to Lambda

After Kinesis Analytics does it's job, the next step is to send that information off to a destination. Kinesis Analytics完成工作后,下一步就是将该信息发送到目的地。 AWS currently offers 3 destination choices: AWS当前提供3个目的地选择:

  • Kinesis stream 运动流
  • Kinesis Firehose delivery stream Kinesis Firehose交付流
  • AWS Lambda function AWS Lambda函数

For my use case, Kinesis Firehose delivery stream is not what I want so I am left with: 对于我的用例, Kinesis Firehose交付流不是我想要的,所以我有:

  • Kinesis stream 运动流
  • AWS Lambda function AWS Lambda函数

If I set the destination to a Kinesis Stream, I would then attach a Lambda to that stream to process the records. 如果将目的地设置为Kinesis Stream,则将Lambda附加到该流以处理记录。

AWS also offers the ability to set the destination to a Lambda, bypassing the Kinesis Stream step of this process. AWS还提供了绕过此过程的Kinesis Stream步骤将目的地设置为Lambda的功能。 In doing some digging for docs I found this: 在对文档进行一些挖掘时,我发现了这一点:

Using a Lambda Function as Output 使用Lambda函数作为输出

Specifically in those docs under Lambda Output Invocation Frequency it says: 特别是在Lambda输出调用频率下的那些文档中,它说:

If records are emitted to the destination in-application stream within the data analytics application as a continuous query or a sliding window, the AWS Lambda destination function is invoked approximately once per second. 如果将记录作为连续查询或滑动窗口发送到数据分析应用程序中的目标应用程序内流,则AWS Lambda目标函数大约每秒调用一次。

My Kinesis Analytics output qualifies under this scenario. 在这种情况下,我的Kinesis Analytics输出合格。 So I can assume that my Lambda will be invoked, "approximately once per second". 因此,我可以假设我的Lambda将被调用,“大约每秒一次”。

I'm trying to understand the difference between using these 2 destinations as it pertains to using a Lambda. 我试图了解使用这两个目的地之间的区别,因为这与使用Lambda有关。

Using AWS Lambda with Kinesis states that: 结合使用AWS Lambda和Kinesis可以指出:

You can subscribe Lambda functions to automatically read batches of records off your Kinesis stream and process them if records are detected on the stream. 您可以订阅Lambda函数以自动从Kinesis流中读取一批记录,并在流上检测到记录时对其进行处理。 AWS Lambda then polls the stream periodically (once per second) for new records. 然后,AWS Lambda定期(每秒一次)轮询流以查找新记录。

So it sounds like the the invocation interval is the same in either case; 因此,听起来在这两种情况下,调用间隔都是相同的; approximately 1 second. 大约1秒。

So I think the guidence is: 所以我认为指导原则是:

If the next stage in the pipeline only needs one consumer, then use the AWS Lambda function destination. 如果管道中的下一阶段仅需要一个使用者,则使用AWS Lambda函数目标。 If however, you need to use multiple different consumers to do different things for the same data sent to the destination, the a Kinesis Stream is more appropriate. 但是,如果您需要使用多个不同的使用者对发送到目的地的同一数据执行不同的操作,则Kinesis Stream更合适。

Is this a correct assumption on how to choose a destination? 这是关于如何选择目的地的正确假设吗? Again, for my use case I am excluding the Kinesis Firehose delivery stream . 同样,对于我的用例,我不包括Kinesis Firehose交付流

If the next stage in the pipeline only needs one consumer, then use the AWS Lambda function destination. 如果管道中的下一阶段仅需要一个使用者,则使用AWS Lambda函数目标。 If however, you need to use multiple different consumers to do different things for the same data sent to the destination, the a Kinesis Stream is more appropriate. 但是,如果您需要使用多个不同的使用者对发送到目的地的同一数据执行不同的操作,则Kinesis Stream更合适。

• I would always use Kinesis Stream with one shard and batch size = 1 (for example) if I wanted the items to be consumed one by one with no concurrency. •如果我想不带并发性地逐一使用项目,则我将始终使用具有一个分片且批次大小= 1的Kinesis Stream。

If there are multiple consumers, increase the number of shards, one lambda is launched in parallel for each shard when there are items to process. 如果有多个消费者,则增加分片数量,当有要处理的项目时,将为每个分片并行启动一个lambda。 If it makes sense, also increase the batch size. 如果有必要,还可以增加批量大小。

But read again at the highlighted phrase below: 但是,请再次阅读以下突出显示的短语:

If however, you need to use multiple different consumers to do different things for the same data sent to the destination, the a Kinesis Stream is more appropriate. 但是,如果您需要使用多个不同的使用者对发送到目的地的同一数据执行不同的操作 ,则Kinesis Stream更合适。

If you have one or more producers and many consumers of the exactly same item , I guess you need to use SNS. 如果您有一个或多个完全相同的商品的生产者和许多消费者,我想您需要使用SNS。 The producer writes the item on one topic, then all the lambdas listening to the topic will process that item. 生产者将项目写在一个主题上,然后所有收听该主题的lambda都会处理该项目。

If this does not answer your question, please clarify it. 如果这样不能回答您的问题,请进行澄清。 There is a little ambiguity. 有点模棱两可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM