简体   繁体   中英

Topic and partition discovery for Kafka consumer

I am fairly new to Flink and Kafka and have some data aggregation jobs written in Scala which run in Apache Flink, the jobs consume data from Kafka perform aggregation and produce results back to Kafka.

I need the jobs to consume data from any new Kafka topic created while the job is running which matches a pattern. I got this working by setting the following properties for my consumer

val properties = new Properties()
properties.setProperty(“bootstrap.servers”, “my-kafka-server”)
properties.setProperty(“group.id”, “my-group-id”)
properties.setProperty(“zookeeper.connect”, “my-zookeeper-server”)
properties.setProperty(“security.protocol”, “PLAINTEXT”)
properties.setProperty(“flink.partition-discovery.interval-millis”, “500”);
properties.setProperty(“enable.auto.commit”, “true”);
properties.setProperty(“auto.offset.reset”, “earliest”);

val consumer = new FlinkKafkaConsumer011[String](Pattern.compile(“my-topic-start-.*”), new SimpleStringSchema(), properties)

The consumer works fine and consumes data from existing topics which start with “my-topic-start-”

When I publish data against a new topic say for example “my-topic-start-test1” for the first time, my consumer does not recognise the topic until after 500 milliseconds after the topic was created, this is based on the properties. When the consumer identifies the topic it does not read the first data record published and starts reading subsequent records so effectively I loose that first data record every time data is published against a new topic.

Is there a setting I am missing or is it how Kafka works? Any help would be appreciated.

Thanks Shravan

I think part of the issue is my producer was creating topic and publishing message in one go, so by the time consumer discovers new partition that message has already been produced.

As a temporary solution I updated my producer to create the topic if it does not exists and then publish a message (make it 2 step process) and this works.

Would be nice to have a more robust consumer side solution though:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM