为什么我的具有相同组ID的Kafka消费者没有得到平衡？

Question

I'm writing a proof of concept application to consume messages from Apache Kafka 0.9.0.0 and see if I can use it instead of a common JMS message broker because of the benefits Kafka provides. 我正在编写一个概念证明应用程序，以使用来自Apache Kafka 0.9.0.0的消息，并查看由于Kafka提供的好处，我是否可以代替普通的JMS消息代理来使用它。 This is my base code, using the new consumer API: 这是我的基本代码，使用新的使用者API：

public class Main implements Runnable {

    public static final long DEFAULT_POLL_TIME = 300;
    public static final String DEFAULT_GROUP_ID = "ltmjTest";

    volatile boolean keepRunning = true;
    private KafkaConsumer<String, Object> consumer;
    private String servers;
    private String groupId = DEFAULT_GROUP_ID;
    private long pollTime = DEFAULT_POLL_TIME;
    private String[] topics;

    public Main() {
    }

    //getters and setters...

    public void createConsumer() {
        Map<String, Object> configs = new HashMap<>();
        configs.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, servers);
        configs.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);

        configs.put("enable.auto.commit", "true");
        configs.put("auto.commit.interval.ms", "1000");
        configs.put("session.timeout.ms", "30000");

        configs.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        configs.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        consumer = new KafkaConsumer<>(configs);
        consumer.subscribe(asList(topics));
    }

    public static void main(String[] args) {
        Main main = new Main();
        if (args != null && args.length > 0) {
            for (String arg : args) {
                String[] realArg = arg.trim().split("=", 2);
                String argKey = realArg[0].toLowerCase();
                String argValue = realArg[1];
                switch (argKey) {
                case "polltime":
                    main.setPollTime(Long.parseLong(argValue));
                    break;
                case "groupid":
                    main.setGroupId(argValue);
                    break;
                case "servers":
                    main.setServers(argValue);
                    break;
                case "topics":
                    main.setTopics(argValue.split(","));
                    break;
            }
        }
        main.createConsumer();
        new Thread(main).start();
        try (Scanner scanner = new Scanner(System.in)) {
            while(true) {
                String line = scanner.nextLine();
                if (line.equals("stop")) {
                    main.setKeepRunning(false);
                    break;
                }
            }
        }
    }
}

I've started a kafka server using default settings and a kafka producer using the shell tool kafka-console-producer.sh to write messages to my topic. 我已经使用默认设置启动了一个kafka服务器，并使用了shell工具kafka-console-producer.sh启动了一个kafka生产者，以将消息写入我的主题。 Then I connect with two consumers using this code, sending the proper server to connect and topic to subscribe, everything else with default values, which means both consumers have the same group id. 然后，我使用此代码与两个使用者进行连接，发送适当的服务器进行连接并进行主题订阅，其他所有内容均具有默认值，这意味着两个使用者都具有相同的组ID。 I notice that only one of my consumers consumes all the data. 我注意到， 只有我的一个消费者使用了所有数据。 I've read that the default behaviour should be that the consumers must be balanced by the server, from the official tutorial : 从官方教程中，我已经读到默认行为应该是消费者必须由服务器平衡：

If all the consumer instances have the same consumer group, then this works just like a traditional queue balancing load over the consumers. 如果所有使用者实例都具有相同的使用者组，则这就像在使用者上使用传统队列平衡负载一样。

How can I fix the consumers to behave like the default? 如何修复使用者的行为使其类似于默认行为？ Or maybe I'm missing something? 还是我想念什么？

Answer 1

there is trait kafka.consumer.PartitionAssignor that says how partitions should be assigned per consumers. 有一个特征kafka.consumer.PartitionAssignor，它说明应如何为每个使用者分配分区。 It has two immplementations: RoundRobinAssignor and RangeAssignor. 它有两个实现：RoundRobinAssignor和RangeAssignor。 The default one is RangeAssignor. 默认值是RangeAssignor。

Can be changed by setting param "partition.assignment.strategy". 可以通过设置参数“ partition.assignment.strategy”来更改。

Round Robin documentation: Round Robin文档：

The roundrobin assignor lays out all the available partitions and all the available consumers. 循环分配器对所有可用分区和所有可用使用者进行布局。 It then proceeds to do a roundrobin assignment from partition to consumer. 然后，它继续进行从分区到使用者的循环分配。 If the subscriptions of all consumer instances are identical, then the partitions will be uniformly distributed. 如果所有使用者实例的订阅都相同，则分区将均匀分布。 (ie, the partition ownership counts will be within a delta of exactly one across all consumers.) For example, suppose there are two consumers C0 and C1, two topics t0 and t1, and each topic has 3 partitions, resulting in partitions t0p0, t0p1, t0p2, t1p0, t1p1, and t1p2. （即，分区拥有者计数将在所有使用者中的一个增量之内。）例如，假设有两个使用者C0和C1，两个主题t0和t1，并且每个主题都有3个分区，从而得出分区t0p0， t0p1，t0p2，t1p0，t1p1和t1p2。 The assignment will be: C0: [t0p0, t0p2, t1p1] C1: [t0p1, t1p0, t1p2] 分配为：C0：[t0p0，t0p2，t1p1] C1：[t0p1，t1p0，t1p2]

Range Assignor documentation 范围分配器文档

The range assignor works on a per-topic basis. 范围分配器基于每个主题工作。 For each topic, we lay out the available partitions in numeric order and the consumers in lexicographic order. 对于每个主题，我们以数字顺序排列可用分区，并以字典顺序排列使用者。 We then divide the number of partitions by the total number of consumers to determine the number of partitions to assign to each consumer. 然后，我们将分区数除以使用者总数，以确定分配给每个使用者的分区数。 If it does not evenly divide, then the first few consumers will have one extra partition. 如果它没有均匀划分，那么前几个消费者将有一个额外的划分。 For example, suppose there are two consumers C0 and C1, two topics t0 and t1, and each topic has 3 partitions, resulting in partitions t0p0, t0p1, t0p2, t1p0, t1p1, and t1p2. 例如，假设有两个使用者C0和C1，两个主题t0和t1，并且每个主题都有3个分区，从而得出分区t0p0，t0p1，t0p2，t1p0，t1p1和t1p2。 The assignment will be: C0: [t0p0, t0p1, t1p0, t1p1] C1: [t0p2, t1p2] 分配为：C0：[t0p0，t0p1，t1p0，t1p1] C1：[t0p2，t1p2]

So, if all our topics have only one partition, only one consumer will work 因此，如果我们所有主题都只有一个分区，那么只有一个使用者可以工作

为什么我的具有相同组ID的Kafka消费者没有得到平衡？

问题描述

1 个解决方案

解决方案1
4 已采纳 2016-05-10 21:09:25

为什么我的具有相同组ID的Kafka消费者没有得到平衡？

问题描述

1 个解决方案

解决方案1 4 已采纳 2016-05-10 21:09:25

解决方案1
4 已采纳 2016-05-10 21:09:25