简体   繁体   中英

Kafka Consumer - Read 2000 LATEST messages from each partition

I was to verify the events (using Java) created in Kafka with three partitions, these are generated by mobile app. I have below consumer properties:

        Properties props = new Properties();
        props.put("bootstrap.servers", "e1.com:6767,e4.com:6767");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("enable.auto.commit", "true");
        props.put("group.id", "QAAutomation_" + com.ltree.core.Utility.generateRandomString(8));
        props.put("auto.offset.reset", "latest");
        props.put("security.protocol", "SASL_SSL");
        props.put("max.poll.records","2000");
        props.put("max.partition.fetch.bytes","10485760");
        System.setProperty("java.security.auth.login.config", "./src/test/resources/kafka-jaas.conf");
        System.setProperty("java.security.krb5.config","DS1.TEST");
        System.setProperty("java.security.krb5.kdc","DS1.TEST");
        System.setProperty("java.security.krb5.realm","DS1.TEST");
        return new KafkaConsumer<>(props);

The goal is to verify if a particular event is generated in kafka by the mobile app. My logic was to fetch 2000 latest events from each partition and iterate over each record to see if record.value() contains searchString.

Due to high volume of evens from the producer (mobile app) data, sometimes the expected event might not be in the first 2000, may be 3500th event (just saying) from the first iteration current and not the latest current. The problem I'm running into is:

Iteration 1: Partition 0: offset 9500 to 7500
Iteration 1: Partition 1: offset 12500 to 10500
Iteration 1: Partition 2: offset 10500 to 8500
Iteration 2: Partition 0: offset 11500 to 9500 <- Here I want to read from 7499, 
where previous iteration left off. How to do this?

I used

current = consumer.position(topicPartition);
consumer.seek(topicPartition, current-2000); 

This moves the starting position from say 6000 to 4000 gives the records. I'm missing out on the records from 6000 to 4000.

Group Id is a random generator, yeah some might be already existing id. Every run it creates a new group id.

any suggestions on how to go about this?

So after playing with it, i used the same above properties but while seeking the consumer record i used the below code: This is not the exact solution but it works for what my situation. This collects the latest 3000 records from the partition everytime.

                con.seekToEnd(Arrays.asList(new TopicPartition(topic, i)));
                long current = con.position(new TopicPartition(topic, i));
                LOG.info("Partition "+ i +": offset: "+current);
                con.seek(new TopicPartition(topic, i), current - 3000);
                counter=0;


i is the index of partition.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM