简体   繁体   中英

Spring Kafka: Difference between 3 apps setConcurrency(1) vs 1 app setConcurrency(3)

Small question regarding Kafka concurrency with Spring Kafka please.

I have one Kafka topic theimportanttopic where many messages are being sent over it. Hard fact, this Kafka topic has three partitions. (Calling them theimportanttopic-0 theimportanttopic-1 theimportanttopic-2)

It is known Kafka does not allow multiple consumers from one same group to consume messages from one same partition. Ie, no two consumers within one same group can consume from theimportanttopic-0.

My Spring Kafka application code is as follow:

@Configuration
class KafkaConsumerConfig {

    @Bean
    public Map<String, Object> consumerConfigs() {
        Map<String, Object> props = new HashMap<>();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "mykafka.com:9092");
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        return props;
    }

    @Bean
    public ConsumerFactory<String, String> consumerFactory() {
        return new DefaultKafkaConsumerFactory<>(consumerConfigs());
    }

    @Bean
    public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
        ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
        factory.setConsumerFactory(consumerFactory());
        factory.setConcurrency(1); //HERE
        return factory;
    }
}
@Component
class KafkaListenersExample {

    Logger LOG = LoggerFactory.getLogger(KafkaListenersExample.class);

    @KafkaListener(topics = "theimportanttopic", groupId = "uniquegroup")
    void listener(String data) {
        LOG.info(data);
        doSomethingImportantWithTheData(data);
    }
}

With that, I am having a hard time understanding the difference between the two constructs:

Suppose this application is already dockerized and a cloud environment is ready for use.

I can either give 1CPU + 1G mem *3 for construct 1, or 3CPU +3G mem for construct 2.

Design number 1: this application, since it is in a container deployed on the cloud, like Kube.netes, spin up three instances of it. By definition, I will have three of those "apps" and each one of the app will consume from one out of the three partition.

kubectl get pods
my-app-AaAaAaAaAa-AaAaA
my-app-BbBbBbBbBb-BbBbB
my-app-CcCcCcCcCc-CcCcC

(and hypothetically, my-app-AaAaAaAaAa-AaAaA consumes theimportanttopic-0, my-app-BbBbBbBbBb-BbBbB theimportanttopic-1, my-app-CcCcCcCcCc-CcCcCtheimportanttopic-2)

Design number 2: On the other hand, I can have one, and only one of this app my-app in container, set the concurrency to 3. (same code as above, just one line change)

    @Bean
    public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
        ConcurrentKafkaListenerContainerFactory<String, String> factory =
                new ConcurrentKafkaListenerContainerFactory<>();
        factory.setConsumerFactory(consumerFactory());
        factory.setConcurrency(3);
        return factory;
    }

What are the differences between the two designs please?

Which one is preferred and why please?

This is not an opinion based question. May I know what is the performance, the cost, the pros and cons between design number 1 and design number 2 please?

Thank you

The difference is high availability.

If you have any one pod, consuming all three partitions, and it stops, then you need additional config in k8s to have a RestartPolicy.

Alternatively, have a ReplicaSet with a maxContainers of 3, and then Kafka Consumer API can rebalance when any one of them starts/stops.

You can also look into KEDA to autoscale based on consumer lag.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM