简体   繁体   English

为多个主题创建一个 kafka 消费者

[英]creating one kafka consumer for several topics

I want to create single kafka consumer for several topics.我想为多个主题创建单个 kafka 消费者。 Method constructor for consumer allows me to transfer arguments for a list of topics inside subscription, like that:消费者的方法构造函数允许我为订阅内的主题列表传输 arguments,如下所示:

private Consumer createConsumer() {
    Properties props = getConsumerProps();
    Consumer<String, byte[]> consumer = new KafkaConsumer<>(props);
    ArrayList<String> topicMISL = new ArrayList<>();
    for (String s:Connect2Redshift.kafkaTopics) {
        topicMISL.add(systemID + "." + s);
    }
    consumer.subscribe(topicMISL);
    return consumer;
}


private boolean consumeMessages( Duration duration, Consumer<String, byte[]> consumer) {
        try {  Long start = System.currentTimeMillis();
            ConsumerRecords<String, byte[]> consumerRecords = consumer.poll(duration);
            }
   }

Afterwards I want to poll records from kafka into stream every 3 sec and process them, but I wonder what is inside this consumer - how will records from different topics be polled - at first one topic, then another, or in parallel.之后,我想每 3 秒将来自 kafka 的记录轮询到 stream 并处理它们,但我想知道这个消费者内部是什么 - 如何轮询来自不同主题的记录 - 首先是一个主题,然后是另一个主题,或者并行。 Could it be that one topic with large amount of messages would be processed all the time and another topic with small amount of messages would wait?会不会一直处理一个消息量大的主题,而另一个消息量少的主题会等待?

in general it depends on your topic settings.一般来说,这取决于您的主题设置。 Kafka scales by using multiple partitions per topic. Kafka 通过每个主题使用多个分区进行扩展。

  • If you have 3 partitions on 1 topic, kafka can read from them in parallel如果您在 1 个主题上有 3 个分区,kafka 可以并行读取它们
  • The same is true for multiple topics, reading can happen in parallel多个主题也是如此,阅读可以并行进行

If you have a partition that receives a lot more messages than the others, you may run into the scenario of a consumer lag for this particular partition.如果您有一个分区接收的消息比其他分区多得多,您可能会遇到此特定分区的消费者滞后的情况。 Tweaking the batch size and consumer settings may help them, also compressing messages.调整批量大小和消费者设置可能会对他们有所帮助,也可以压缩消息。 Ideally making sure to distribute the load evenly avoids this scenario.理想情况下,确保平均分配负载可以避免这种情况。

Look into this blog article, it gave me a good understanding of the internals: https://www.confluent.io/blog/configure-kafka-to-minimize-latency/查看这篇博客文章,它让我对内部有了很好的了解: https://www.confluent.io/blog/configure-kafka-to-minimize-latency/

ConsumerRecords<String, String> records = consumer.poll(long value);
    for (TopicPartition partition : records.partitions()) {
        List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
        for (ConsumerRecord<String, String> record : partitionRecords) {
            
        }
        
    }

Also need to do commit for offset by finding offset and commit using consumer.commitSync还需要通过使用 consumer.commitSync 查找偏移量和提交来提交偏移量

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM