简体   繁体   English

多个分区上的 Kafka 消费者

[英]Kafka consumer on multiple partitions

My consumer does not get every messages at time.我的消费者不会同时收到所有消息。 I have Kafka cluster with 3 brokers (3 servers) with 3 partitions for topic and replication factor 3.我有 Kafka 集群,有 3 个代理(3 个服务器),有 3 个主题分区和复制因子 3。

Topic: my-topic       Partition: 0    Leader: 2       Replicas: 2,1,3 Isr: 3,1,2
Topic: my-topic       Partition: 1    Leader: 3       Replicas: 3,2,1 Isr: 3,2,1
Topic: my-topic       Partition: 2    Leader: 1       Replicas: 1,3,2 Isr: 1,3,2

I have consumer in Java and I set max poll records on 50000 fetch bytes configs on 50MB.我在 Java 中有消费者,我在 50MB 上设置了 50000 个获取字节配置的最大轮询记录。 Application polls every minute.应用程序每分钟轮询一次。 When I send 10 messages to topic "my-topic", consumer does not give me all messages but only some of them and the rest in next run.当我向主题“my-topic”发送 10 条消息时,消费者并没有给我所有消息,而是只给我其中的一些消息和下一次运行的 rest。 Messages are produced by script during my applicatin sleep.消息是在我的应用程序睡眠期间由脚本生成的。 Do you think it can be caused by partitions that poll method gives me only messages from server/partition which response first and the rest in next run?你认为它可能是由分区引起的,轮询方法只给我来自服务器/分区的消息,哪个首先响应,下一次运行 rest?

Consumer:消费者:

Map<String, Object> configurations = new HashMap<>();
configurations.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, servers);
configurations.put(ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG, "true");
configurations.put(ConsumerConfig.GROUP_ID_CONFIG, groupId ;
configurations.put(ConsumerConfig.CLIENT_ID_CONFIG, groupId);
configurations.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, JsonDeserializer.class);
configurations.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
configurations.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
configurations.put(ConsumerConfig.FETCH_MAX_BYTES_CONFIG, "52428800");
configurations.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, "52428800");
configurations.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "3600000");
configurations.put(JsonDeserializer.TRUSTED_PACKAGES, "my.package.model");
configurations.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "50000");

consumer = new KafkaConsumer<Object, Object>(configurations);
consumer.subscribe(Collections.singletonList("my-topic"));

while(true) {
    ConsumerRecords<Object, Object> records = consumer.poll(Duration.ofMillis(10000));

    if(records.count() > 0) {
        LOGGER.debug("records count: {}", records.count());
        handleMessages(records);
        consumer.commitSync();
    }
    sleep(60000);
}

In handle method I log messages.在处理方法中,我记录消息。 Client and ts (timestamp) are data in message.客户端和 ts(时间戳)是消息中的数据。 Consumer gave me in first only 3 messages with similiar offset (I would say messages from one server/partition) after one minute sleep gave me the rest with 2 different offsets (I would say two others servers/partitions).消费者在一分钟的睡眠后只给了我 3 条具有类似偏移量的消息(我会说来自一个服务器/分区的消息)给了我 rest 具有 2 个不同的偏移量(我会说另外两个服务器/分区)。

2021-11-23 08:27:14.851 [DEBUG] --- records count: 3
2021-11-23 08:27:14.853 [DEBUG] --- offset=1175, client=test-27, ts=1637652419417
2021-11-23 08:27:14.857 [DEBUG] --- offset=1176, client=test-28, ts=1637652419418
2021-11-23 08:27:14.860 [DEBUG] --- offset=1177, client=test-29, ts=1637652419418

2021-11-23 08:28:14.924 [DEBUG] --- records count: 7
2021-11-23 08:28:14.925 [DEBUG] --- offset=232304, client=test-20, ts=1637652419406
2021-11-23 08:28:14.929 [DEBUG] --- offset=232305, client=test-21, ts=1637652419407
2021-11-23 08:28:14.933 [DEBUG] --- offset=232306, client=test-24, ts=1637652419411
2021-11-23 08:28:14.937 [DEBUG] --- offset=1141, client=test-22, ts=1637652419408
2021-11-23 08:28:14.941 [DEBUG] --- offset=1142, client=test-23, ts=1637652419410
2021-11-23 08:28:14.944 [DEBUG] --- offset=1143, client=test-25, ts=1637652419414
2021-11-23 08:28:14.949 [DEBUG] --- offset=1144, client=test-26, ts=1637652419415

Does anyone knows what I do wrong or missed some parameter in config and how to repaire it?有谁知道我做错了什么或错过了配置中的某些参数以及如何修复它?

Thanks谢谢

This is normal, Kafka doesn't guarantee you get all messages in one batch.这很正常,Kafka 不保证您可以一次性收到所有消息。
You would not typically sleep between polls as this can cause the client to timeout.您通常不会在轮询之间休眠,因为这会导致客户端超时。 Instead you rely on the poll duration preventing busy spinning.相反,您依靠轮询持续时间来防止忙碌旋转。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM