简体   繁体   中英

Getting Data from two different Streams in Kinesis?

I am trying to make a Kinesis Consumer Client. To work on it I went through the Developer Guide of Kinesis and AWS Document http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-record-processor-implementation-app-java.html .

I was wondering is It possible to get Data from two different Streams and process it accordingly.

Say I have two different Streams stream1 and stream2 .

Is it possible to get Data from both stream and process individually ?

Why not? Do get_records from both streams.

If your streams only have a single shard each, you will also see all the events, as it is recommended to process each shard with a single worker, but if your logic is somehow to join events from different sources/streams, you can implement it with a single worker reading from both streams.

Note that if you have streams with multiple shards, each one of your workers will see only a part of the events. You can have the following options:

  • Both streams have a single shard each - in this case you can read with a single worker from bout streams and see all events from both streams. You can add timestamps or other keys to allow you to "join" these events in the worker.

  • One stream ( stream1 ) with one shard and the second streams ( stream2 ) with multiple shards - in this case you can read from stream1 from all your workers, that will also process single shard from the stream2 each. Each one of your workers will see all the events of stream1 and its share of events of stream2 . Note that you have a limit of the speed that you can read the events from stream1 with the single shard (2MB/second or 5 reads/second), and if you have many shards in stream2 , this can be a real limit.

  • Both streams can have multiple shards - in this case it will be more complex for you to ensure that you are able to "join" these events, as you need to sync both the writes and the reads to these streams. You can also read from all shards of both streams with a single worker, but this is not a good practice as it is limiting your ability to scale since you don't have a distributed system anymore. Another option is to use the same partition_key in both streams, and have the same number of shards and partition definition for both streams, and verify that you are reading from the "right" shard from each stream in each of your workers, and that you are doing it correctly every time one of your workers is failing and restarting, which might be a bit complex.

Another option that you can consider is to write both types of events in a single stream, again using the same partition_key , and then filter them on the reader side if you need to process them differently (for example, to write them to different log files in S3).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM