简体繁体中英

Getting Data from two different Streams in Kinesis?

原文 2015-05-06 05:27:48 1 1 amazon-web-services/ amazon-kinesis

I am trying to make a Kinesis Consumer Client. To work on it I went through the Developer Guide of Kinesis and AWS Document http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-record-processor-implementation-app-java.html .

I was wondering is It possible to get Data from two different Streams and process it accordingly.

Say I have two different Streams stream1 and stream2 .

Is it possible to get Data from both stream and process individually ?

1 answers

Why not? Do get_records from both streams.

If your streams only have a single shard each, you will also see all the events, as it is recommended to process each shard with a single worker, but if your logic is somehow to join events from different sources/streams, you can implement it with a single worker reading from both streams.

Note that if you have streams with multiple shards, each one of your workers will see only a part of the events. You can have the following options:

Both streams have a single shard each - in this case you can read with a single worker from bout streams and see all events from both streams. You can add timestamps or other keys to allow you to "join" these events in the worker.
One stream ( stream1 ) with one shard and the second streams ( stream2 ) with multiple shards - in this case you can read from stream1 from all your workers, that will also process single shard from the stream2 each. Each one of your workers will see all the events of stream1 and its share of events of stream2 . Note that you have a limit of the speed that you can read the events from stream1 with the single shard (2MB/second or 5 reads/second), and if you have many shards in stream2 , this can be a real limit.
Both streams can have multiple shards - in this case it will be more complex for you to ensure that you are able to "join" these events, as you need to sync both the writes and the reads to these streams. You can also read from all shards of both streams with a single worker, but this is not a good practice as it is limiting your ability to scale since you don't have a distributed system anymore. Another option is to use the same partition_key in both streams, and have the same number of shards and partition definition for both streams, and verify that you are reading from the "right" shard from each stream in each of your workers, and that you are doing it correctly every time one of your workers is failing and restarting, which might be a bit complex.

Another option that you can consider is to write both types of events in a single stream, again using the same partition_key , and then filter them on the reader side if you need to process them differently (for example, to write them to different log files in S3).

Kinesis for grouping jobs from DynamoDB streams to Data Storage

How to consume records from Kinesis Data Streams using KCL

"Kinesis data delivery streams" is the same with "Kinesis data stream"?

How to merge Kinesis data streams into one for Kinesis data analytics?

When do I need to Kinesis Data Streams together with Kinesis Firehose?

Consuming DynamoDB Streams with AWS Kinesis Data Analytics

Do I need to create two kinesis delivery streams to send messages under two different paths in the same s3 bucket?

Put data from AWS Kinesis into different buckets based on data type

How to parse data from 2 x Kinesis streams in 1 x Spark Streaming App?

Rows are not getting added on AWS Kinesis analytics In-application SQL stream from Kinesis data stream

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Kinesis for grouping jobs from DynamoDB streams to Data Storage How to consume records from Kinesis Data Streams using KCL "Kinesis data delivery streams" is the same with "Kinesis data stream"? How to merge Kinesis data streams into one for Kinesis data analytics? When do I need to Kinesis Data Streams together with Kinesis Firehose? Consuming DynamoDB Streams with AWS Kinesis Data Analytics Do I need to create two kinesis delivery streams to send messages under two different paths in the same s3 bucket? Put data from AWS Kinesis into different buckets based on data type How to parse data from 2 x Kinesis streams in 1 x Spark Streaming App? Rows are not getting added on AWS Kinesis analytics In-application SQL stream from Kinesis data stream

Related Tags

Getting Data from two different Streams in Kinesis?

Question

1 answers

solution1 5 ACCPTED 2015-05-07 04:42:37

solution1
5 ACCPTED 2015-05-07 04:42:37