[英]How to fanout an AWS kinesis stream?
I'd like to fanout/chain/replicate an Input AWS Kinesis stream To N new Kinesis streams , So that each record written to the input Kinesis will appear in each of the N streams.我想将输入 AWS Kinesis 流扇出/链接/复制到 N 个新的 Kinesis 流,以便写入输入 Kinesis 的每个记录都将出现在 N 个流中的每一个中。
Is there an AWS service or an open source solution ?是否有AWS 服务或开源解决方案?
I prefer not to write code to do that if there's a ready-made solution.如果有现成的解决方案,我宁愿不编写代码来做到这一点。 AWS Kinesis firehose is a no solution because it can't output to kinesis.
AWS Kinesis firehose不是解决方案,因为它无法输出到 kinesis。 Perhaps a AWS Lambda solution if that won't be too expensive to run?
如果运行起来不会太昂贵,也许是 AWS Lambda 解决方案?
There are two ways you could accomplish fan-out of an Amazon Kinesis stream :有两种方法可以完成Amazon Kinesis 流的扇出:
Option 1: Using Amazon Kinesis Analytics to fan-out选项 1:使用 Amazon Kinesis Analytics 进行扇出
You can use Amazon Kinesis Analytics to generate a new stream from an existing stream.您可以使用Amazon Kinesis Analytics从现有流生成新流。
From the Amazon Kinesis Analytics documentation :来自Amazon Kinesis Analytics 文档:
Amazon Kinesis Analytics applications continuously read and process streaming data in real-time .
Amazon Kinesis Analytics 应用程序持续实时读取和处理流数据。 You write application code using SQL to process the incoming streaming data and produce output.
您使用 SQL 编写应用程序代码来处理传入的流数据并产生输出。 Then, Amazon Kinesis Analytics writes the output to a configured destination .
然后,Amazon Kinesis Analytics将输出写入配置的目的地。
Fan-out is mentioned in the Application Code section: 应用程序代码部分提到了扇出:
You can also write SQL queries that run independent of each other.
您还可以编写相互独立运行的 SQL 查询。 For example, you can write two SQL statements that query the same in-application stream, but send output into different in-applications streams .
例如,您可以编写两个 SQL 语句来查询相同的应用程序内流,但将输出发送到不同的应用程序内流。
I managed to implement this as follows:我设法实现如下:
The Amazon Kinesis Analytics SQL application looks like this: Amazon Kinesis Analytics SQL 应用程序如下所示:
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM"
(log VARCHAR(16));
CREATE OR REPLACE PUMP "COPY_PUMP1" AS
INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM "log" FROM "SOURCE_SQL_STREAM_001";
This code creates a pump (think of it as a continual select statement) that selects from the input
stream and outputs to the output1
stream.此代码创建了一个泵(将其视为一个连续的选择语句),它从
input
流中进行选择并输出到output1
流。 I created another identical application that outputs to the output2
stream.我创建了另一个输出到
output2
流的相同应用程序。
To test, I sent data to the input
stream:为了测试,我将数据发送到
input
流:
#!/usr/bin/env python
import json, time
from boto import kinesis
kinesis = kinesis.connect_to_region("us-west-2")
i = 0
while True:
data={}
data['log'] = 'Record ' + str(i)
i += 1
print data
kinesis.put_record("input", json.dumps(data), "key")
time.sleep(2)
I let it run for a while, then displayed the output using this code:我让它运行了一段时间,然后使用以下代码显示输出:
from boto import kinesis
kinesis = kinesis.connect_to_region("us-west-2")
iterator = kinesis.get_shard_iterator('output1', 'shardId-000000000000', 'TRIM_HORIZON')['ShardIterator']
records = kinesis.get_records(iterator, 5)
print [r['Data'] for r in records['Records']]
The output was:输出是:
[u'{"LOG":"Record 0"}', u'{"LOG":"Record 1"}', u'{"LOG":"Record 2"}', u'{"LOG":"Record 3"}', u'{"LOG":"Record 4"}']
I ran it again for output2
and the identical output was shown.我再次为
output2
运行它,并显示了相同的输出。
Option 2: Using AWS Lambda选项 2:使用 AWS Lambda
If you are fanning-out to many streams, a more efficient method might be to create an AWS Lambda function:如果您要扇出许多流,更有效的方法可能是创建一个 AWS Lambda 函数:
You could even have the Lambda function self-discover the output streams based on a naming convention (eg any stream named app-output-*
).您甚至可以让 Lambda 函数根据命名约定自行发现输出流(例如,任何名为
app-output-*
流)。
There is a github repo from Amazon lab providing the fanout using lambda.有一个来自 Amazon 实验室的 github repo 提供了使用 lambda 的扇出。 https://github.com/awslabs/aws-lambda-fanout .
https://github.com/awslabs/aws-lambda-fanout 。 Also read "Transforming a synchronous Lambda invocation into an asynchronous one" on https://medium.com/retailmenot-engineering/building-a-high-throughput-data-pipeline-with-kinesis-lambda-and-dynamodb-7d78e992a02d , which is critical to build a truly asynchronous processing.
另请阅读https://medium.com/retailmenot-engineering/building-a-high-throughput-data-pipeline-with-kinesis-lambda-and-dynamodb-7d78e992a02d上的“将同步 Lambda 调用转换为异步调用”,这对于构建真正的异步处理至关重要。
There are two AWS native solutions to fanning out Kinesis streams that don't require AWS Firehose or AWS Lambda.有两种 AWS 原生解决方案可以扇出不需要 AWS Firehose 或 AWS Lambda 的 Kinesis 流。
One caveat as far I am aware with these two options is that you need to use the Kinesis Client Library (KCL) (and not the raw AWS SDK ).据我所知,这两个选项的一个警告是您需要使用Kinesis Client Library (KCL) (而不是原始AWS SDK )。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.