简体   繁体   English

如何扇出 AWS kinesis 流?

[英]How to fanout an AWS kinesis stream?

I'd like to fanout/chain/replicate an Input AWS Kinesis stream To N new Kinesis streams , So that each record written to the input Kinesis will appear in each of the N streams.我想将输入 AWS Kinesis 流扇出/链接/复制到 N 个新的 Kinesis 流,以便写入输入 Kinesis 的每个记录都将出现在 N 个流中的每一个中。

Is there an AWS service or an open source solution ?是否有AWS 服务或开源解决方案

I prefer not to write code to do that if there's a ready-made solution.如果有现成的解决方案,我宁愿不编写代码来做到这一点。 AWS Kinesis firehose is a no solution because it can't output to kinesis. AWS Kinesis firehose不是解决方案,因为它无法输出到 kinesis。 Perhaps a AWS Lambda solution if that won't be too expensive to run?如果运行起来不会太昂贵,也许是 AWS Lambda 解决方案?

There are two ways you could accomplish fan-out of an Amazon Kinesis stream :有两种方法可以完成Amazon Kinesis 流的扇出

  • Use Amazon Kinesis Analytics to copy records to additional streams使用Amazon Kinesis Analytics将记录复制到其他流
  • Trigger an AWS Lambda function to copy records to another stream触发AWS Lambda函数以将记录复制到另一个流

Option 1: Using Amazon Kinesis Analytics to fan-out选项 1:使用 Amazon Kinesis Analytics 进行扇出

You can use Amazon Kinesis Analytics to generate a new stream from an existing stream.您可以使用Amazon Kinesis Analytics从现有流生成新流。

From the Amazon Kinesis Analytics documentation :来自Amazon Kinesis Analytics 文档

Amazon Kinesis Analytics applications continuously read and process streaming data in real-time . Amazon Kinesis Analytics 应用程序持续实时读取和处理流数据 You write application code using SQL to process the incoming streaming data and produce output.您使用 SQL 编写应用程序代码来处理传入的流数据并产生输出。 Then, Amazon Kinesis Analytics writes the output to a configured destination .然后,Amazon Kinesis Analytics将输出写入配置的目的地

Amazon Kinesis Analytics 流程图

Fan-out is mentioned in the Application Code section: 应用程序代码部分提到了扇出:

You can also write SQL queries that run independent of each other.您还可以编写相互独立运行的 SQL 查询。 For example, you can write two SQL statements that query the same in-application stream, but send output into different in-applications streams .例如,您可以编写两个 SQL 语句来查询相同的应用程序内流,但将输出发送到不同的应用程序内流

I managed to implement this as follows:我设法实现如下:

  • Created three streams: input, output1, output2创建了三个流:输入、输出1、输出2
  • Created two Amazon Kinesis Analytics applications: copy1, copy2创建了两个 Amazon Kinesis Analytics 应用程序:copy1、copy2

The Amazon Kinesis Analytics SQL application looks like this: Amazon Kinesis Analytics SQL 应用程序如下所示:

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM"
(log VARCHAR(16));

CREATE OR REPLACE PUMP "COPY_PUMP1" AS
  INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM "log" FROM "SOURCE_SQL_STREAM_001";

This code creates a pump (think of it as a continual select statement) that selects from the input stream and outputs to the output1 stream.此代码创建了一个(将其视为一个连续的选择语句),它从input流中进行选择并输出到output1流。 I created another identical application that outputs to the output2 stream.我创建了另一个输出到output2流的相同应用程序。

To test, I sent data to the input stream:为了测试,我将数据发送到input流:

#!/usr/bin/env python

import json, time
from boto import kinesis

kinesis = kinesis.connect_to_region("us-west-2")
i = 0

while True:
  data={}
  data['log'] =  'Record ' + str(i)
  i += 1
  print data
  kinesis.put_record("input", json.dumps(data), "key")
  time.sleep(2)

I let it run for a while, then displayed the output using this code:我让它运行了一段时间,然后使用以下代码显示输出:

from boto import kinesis

kinesis = kinesis.connect_to_region("us-west-2")
iterator = kinesis.get_shard_iterator('output1', 'shardId-000000000000', 'TRIM_HORIZON')['ShardIterator']
records = kinesis.get_records(iterator, 5)
print [r['Data'] for r in records['Records']]

The output was:输出是:

[u'{"LOG":"Record 0"}', u'{"LOG":"Record 1"}', u'{"LOG":"Record 2"}', u'{"LOG":"Record 3"}', u'{"LOG":"Record 4"}']

I ran it again for output2 and the identical output was shown.我再次为output2运行它,并显示了相同的输出。

Option 2: Using AWS Lambda选项 2:使用 AWS Lambda

If you are fanning-out to many streams, a more efficient method might be to create an AWS Lambda function:如果您要扇出许多流,更有效的方法可能是创建一个 AWS Lambda 函数:

  • Triggered by Amazon Kinesis stream records由 Amazon Kinesis 流记录触发
  • That writes records to multiple Amazon Kinesis 'output' streams将记录写入多个 Amazon Kinesis“输出”流

You could even have the Lambda function self-discover the output streams based on a naming convention (eg any stream named app-output-* ).您甚至可以让 Lambda 函数根据命名约定自行发现输出流(例如,任何名为app-output-*流)。

There is a github repo from Amazon lab providing the fanout using lambda.有一个来自 Amazon 实验室的 github repo 提供了使用 lambda 的扇出。 https://github.com/awslabs/aws-lambda-fanout . https://github.com/awslabs/aws-lambda-fanout Also read "Transforming a synchronous Lambda invocation into an asynchronous one" on https://medium.com/retailmenot-engineering/building-a-high-throughput-data-pipeline-with-kinesis-lambda-and-dynamodb-7d78e992a02d , which is critical to build a truly asynchronous processing.另请阅读https://medium.com/retailmenot-engineering/building-a-high-throughput-data-pipeline-with-kinesis-lambda-and-dynamodb-7d78e992a02d上的“将同步 Lambda 调用转换为异步调用”,这对于构建真正的异步处理至关重要。

There are two AWS native solutions to fanning out Kinesis streams that don't require AWS Firehose or AWS Lambda.有两种 AWS 原生解决方案可以扇出不需要 AWS Firehose 或 AWS Lambda 的 Kinesis 流。

  1. Similar to Kafka consumer groups, Kinesis has the application name.与 Kafka 消费者组类似,Kinesis 具有应用程序名称。 Every consumer to the stream can provide a unique application name.流的每个使用者都可以提供唯一的应用程序名称。 If two consumer has the same application name, then messages are distributed between them.如果两个消费者具有相同的应用程序名称,则消息将在它们之间分发。 To fan out the stream, provide a different application name for those consumers that you want to receive the same messages from the stream.要扇出流,请为您希望从流接收相同消息的使用者提供不同的应用程序名称。 Kinesis will, under the hood, create new DynamoDB tables to keep track of each consumer for each new application so that they can consume messages at a different rate, etc. Kinesis 将在幕后创建新的 DynamoDB 表来跟踪每个新应用程序的每个使用者,以便他们可以以不同的速率使用消息等。
  2. Use Kinesis Enhanced Fan-Out for higher throughput (up to 2MiB per second) and this does not count towards your global read limit.使用Kinesis Enhanced Fan-Out获得更高的吞吐量(每秒高达 2MiB),这不计入您的全局读取限制。 At the time of writing, there is a limit of 20 "enhanced fan-out" consumers per stream.在撰写本文时,每个流有 20 个“增强型扇出”消费者的限制。

One caveat as far I am aware with these two options is that you need to use the Kinesis Client Library (KCL) (and not the raw AWS SDK ).据我所知,这两个选项的一个警告是您需要使用Kinesis Client Library (KCL) (而不是原始AWS SDK )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM