简体   繁体   English

从Kinesis中的两个不同流中获取数据?

[英]Getting Data from two different Streams in Kinesis?

I am trying to make a Kinesis Consumer Client. 我正在尝试成为Kinesis Consumer Client。 To work on it I went through the Developer Guide of Kinesis and AWS Document http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-record-processor-implementation-app-java.html . 为此,我浏览了Kinesis开发人员指南和AWS文档http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-record-processor-implementation-app-java.html

I was wondering is It possible to get Data from two different Streams and process it accordingly. 我想知道是否有可能从两个不同的流中获取数据并进行相应处理。

Say I have two different Streams stream1 and stream2 . 假设我有两个不同的Streams stream1stream2

Is it possible to get Data from both stream and process individually ? 是否可以从流和流程中分别获取数据?

Why not? 为什么不? Do get_records from both streams. 从两个流中获取get_records。

If your streams only have a single shard each, you will also see all the events, as it is recommended to process each shard with a single worker, but if your logic is somehow to join events from different sources/streams, you can implement it with a single worker reading from both streams. 如果流中每个流只有一个分片,那么您还将看到所有事件,因为建议使用一个工作程序处理每个分片,但是如果您的逻辑是通过某种方式将来自不同源/流的事件联接在一起,则可以实现它一个工人从两个流中读取。

Note that if you have streams with multiple shards, each one of your workers will see only a part of the events. 请注意,如果您的流具有多个分片,则每个工作人员将仅看到事件的一部分。 You can have the following options: 您可以选择以下选项:

  • Both streams have a single shard each - in this case you can read with a single worker from bout streams and see all events from both streams. 两个流都有一个单独的分片-在这种情况下,您可以使用单个工作程序从bout流中读取数据,并查看两个流中的所有事件。 You can add timestamps or other keys to allow you to "join" these events in the worker. 您可以添加时间戳或其他键,以允许您在工作程序中“加入”这些事件。

  • One stream ( stream1 ) with one shard and the second streams ( stream2 ) with multiple shards - in this case you can read from stream1 from all your workers, that will also process single shard from the stream2 each. 一个流( 流1)有一个碎片,并与多个碎片第二流( 流2) -在这种情况下,你可以从流1从您的所有工作人员,也将从STREAM2每个处理单碎片阅读。 Each one of your workers will see all the events of stream1 and its share of events of stream2 . 您的每个工作人员都将看到stream1的所有事件以及stream2的事件份额 Note that you have a limit of the speed that you can read the events from stream1 with the single shard (2MB/second or 5 reads/second), and if you have many shards in stream2 , this can be a real limit. 请注意,使用单个分片(2MB /秒或5个读取/秒)可以从stream1读取事件的速度受到限制 ,并且如果stream2中有许多分片,这可能是一个实际的限制。

  • Both streams can have multiple shards - in this case it will be more complex for you to ensure that you are able to "join" these events, as you need to sync both the writes and the reads to these streams. 两个流都可以有多个分片-在这种情况下,要确保您能够“加入”这些事件,这将变得更加复杂,因为您需要将写入和读取都同步到这些流。 You can also read from all shards of both streams with a single worker, but this is not a good practice as it is limiting your ability to scale since you don't have a distributed system anymore. 您也可以使用一个工作程序读取两个流的所有分片,但这不是一个好习惯,因为这将限制扩展能力,因为您再也没有分布式系统了。 Another option is to use the same partition_key in both streams, and have the same number of shards and partition definition for both streams, and verify that you are reading from the "right" shard from each stream in each of your workers, and that you are doing it correctly every time one of your workers is failing and restarting, which might be a bit complex. 另一个选择是在两个流中使用相同的partition_key ,并为两个流使用相同数量的分片和分区定义,并验证您是否从每个工作线程中的每个流的“正确”分片中读取数据,并且每次您的一个工作人员发生故障并重新启动时,都可以正确执行此操作,这可能会有些复杂。

Another option that you can consider is to write both types of events in a single stream, again using the same partition_key , and then filter them on the reader side if you need to process them differently (for example, to write them to different log files in S3). 您可以考虑的另一种选择是将两种类型的事件都写在单个流中,再次使用相同的partition_key ,然后如果需要以不同方式处理它们(例如,将它们写入不同的日志文件),则在读取器端对其进行过滤。在S3中)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kinesis 用于将 DynamoDB 流中的作业分组到数据存储 - Kinesis for grouping jobs from DynamoDB streams to Data Storage 如何使用 KCL 使用来自 Kinesis Data Streams 的记录 - How to consume records from Kinesis Data Streams using KCL “Kinesis 数据传输流”和“Kinesis 数据流”是一样的吗? - "Kinesis data delivery streams" is the same with "Kinesis data stream"? 如何将 Kinesis 数据流合并为一个用于 Kinesis 数据分析? - How to merge Kinesis data streams into one for Kinesis data analytics? 我何时需要将 Kinesis Data Streams 与 Kinesis Firehose 结合使用? - When do I need to Kinesis Data Streams together with Kinesis Firehose? 通过 AWS Kinesis Data Analytics 使用 DynamoDB Streams - Consuming DynamoDB Streams with AWS Kinesis Data Analytics 我是否需要创建两个 kinesis 传输流以在同一个 s3 存储桶中的两个不同路径下发送消息? - Do I need to create two kinesis delivery streams to send messages under two different paths in the same s3 bucket? 根据数据类型将来自AWS Kinesis的数据放入不同的存储桶中 - Put data from AWS Kinesis into different buckets based on data type 如何在1 x Spark Streaming App中解析来自2 x Kinesis流的数据? - How to parse data from 2 x Kinesis streams in 1 x Spark Streaming App? 行未添加到 AWS Kinesis 分析 Kinesis 数据流的应用程序内 SQL 流 - Rows are not getting added on AWS Kinesis analytics In-application SQL stream from Kinesis data stream
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM