I built a spring boot kinesis consumer with the following components:
I consume events from a kinesis stream with 1 shard . Also this spring boot consumer application is running in Pivotal Cloud Foundry Platform .
I tried the scenario locally (with kinesalite) and in PCF (with a kinesis stream) before posting this question. Can you please confirm whether my understanding is right? I went through the spring cloud stream documentation ( https://docs.spring.io/spring-cloud-stream/docs/current/reference/htmlsingle/ and https://github.com/spring-cloud/spring-cloud-stream-binder-aws-kinesis/blob/master/spring-cloud-stream-binder-kinesis-docs/src/main/asciidoc/overview.adoc ). Though the documentation is exhaustive, concurrency and high availability is not explained in detail.
Let's say I have 3 instances of the consumer deployed to PCF (by setting the instances attribute to 3 in the manifest.yml file which is used during cf push).
All 3 instances have the below properties :
spring.cloud.stream.bindings..consumer.concurrency=5
spring.cloud.stream.bindings..group=my-consumer-group
spring.cloud.stream.kinesis.binder.checkpoint.table=my-metadata-dynamodb-table
spring.cloud.stream.kinesis.binder.locks.table=my-locks-dynamodb-table
Let's say the events were sent to kinesis by the producer in this order
event5 (most recent event in the stream) - event4 - event3 - event2 - event1 (first event in the stream)
For such a configuration, I have explained my understanding below. Can you confirm whether this is right?
Please, see concurrency
option JavaDocs in the KinesisMessageDrivenChannelAdapter
:
/**
* The maximum number of concurrent {@link ConsumerInvoker}s running.
* The {@link ShardConsumer}s are evenly distributed between {@link ConsumerInvoker}s.
* Messages from within the same shard will be processed sequentially.
* In other words each shard is tied with the particular thread.
* By default the concurrency is unlimited and shard
* is processed in the {@link #consumerExecutor} directly.
* @param concurrency the concurrency maximum number
*/
public void setConcurrency(int concurrency) {
So, since you have only one shard in that one stream, there is going to be only one active thread which iterates over ShardIterator
s on that single shard.
The point is that we always have to process records from a single shard in a single thread. This way we guarantee a proper order, plus checkpoint is done for the highest sequence number.
Please, investigate more what is AWS Kinesis and how it works.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.