简体   繁体   English

Kafka 0.10.2消费者获得大量重复

[英]Kafka 0.10.2 consumers getting massive number of duplicates

I have a fairly simple Kafka setup - 1 producer, 1 topic, 10 partitions, 10 KafkaConsumers all with the same group ID, all running on a single machine. 我有一个相当简单的Kafka设置-1个生产者,1个主题,10个分区,10个KafkaConsumers,它们都具有相同的组ID,并且都在一台计算机上运行。 When I process a file, the producer quickly creates 3269 messages, which the consumers happily start consuming. 当我处理文件时,生产者迅速创建3269条消息,消费者很高兴地开始使用它们。 Everything runs fine for a while, but at a certain point the consumers start consuming duplicates - LOTS of duplicates. 一切运行良好一段时间,但是在某些时候,消费者开始使用副本-很多副本。 In fact, it looks like they just start consuming the message queue over again. 实际上,看起来他们只是开始再次消耗消息队列。 If I let it run for a long time, the database will start receiving the same data entries 6 or more times. 如果我让它长时间运行,数据库将开始接收相同的数据条目6次或更多次。 After doing some tests with logging, it looks like the consumers are re-consuming the same messages with the same unique message names. 在使用日志记录进行了一些测试之后,看起来使用者正在以相同的唯一消息名称重用相同的消息。

As far as I can tell, no re-balancing is happening. 据我所知,没有任何平衡发生。 Consumers are not dying or being added. 消费者并没有垂死或增加。 It's the same 10 consumers, consuming the same 3269 messages over and over until I kill the process. 这是相同的10个使用者,一遍又一遍地消耗相同的3269条消息,直到我终止进程为止。 If I just let it go, the consumers will write dozens of thousands of records, massively increasing the amount of data that really should be going into the database. 如果我放任不管,那么消费者将写入成千上万的记录,从而极大地增加了本应进入数据库的数据量。

I'm fairly new to Kafka, but I'm kind of at a loss for why this is happening. 我对Kafka相当陌生,但是对于为什么会发生这种情况我有点茫然。 I know Kafka doesn't guarantee exactly-once processing, and I'm ok with a couple duplicates here and there. 我知道Kafka不保证一次处理完全正确,我可以在这里和那里做几个重复。 I have code to prevent persisting the same records again. 我有防止再次保留相同记录的代码。 However, I'm not sure why the consumers would re-consume the queue over and over. 但是,我不确定为什么消费者会反复使用队列。 I know that Kafka messages aren't deleted after they are consumed, but if all the consumers are in the same group, the offsets should prevent this, right? 我知道,使用完Kafka消息后不会将其删除,但是如果所有使用方都在同一个组中,则偏移量应可以防止这种情况,对吗? I understand a little bit about how offsets work, but as far as I know, they shouldn't be getting reset if there is no re-balancing, right? 我对偏移量的工作方式有些了解,但据我所知,如果没有重新平衡,它们不应该被重置,对吗? And the messages aren't timing out as far as I can tell. 据我所知,消息并没有超时。 Is there a way for me to get my consumers to consume everything in the queue once-ish and then wait for more messages without re-consuming the same stuff forever? 有没有一种方法可以让我的消费者一次消费一次队列中的所有内容,然后等待更多消息而不会永远重复使用相同的东西?

Here are the proprties I pass in to the producer and consumers: 这是我传递给生产者和消费者的属性:

Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("acks", "all");
        props.put("retries", 0);
        props.put("batch.size", 16384);
        props.put("linger.ms", 1);
        props.put("buffer.memory", 33554432);
        props.put("group.id", "MyGroup");
        props.put("num.partitions", 10);
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");

        MyIngester ingester = new MyIngester(args[0], props);

To me this seems to be an issue with acknowledging the receipt. 对我来说,这似乎是确认收据的问题。 Try the following properties 尝试以下属性

    props.put("enable.auto.commit", "true");
    props.put("auto.commit.interval.ms", "100");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM