[英]Topic and partition discovery for Kafka consumer
I am fairly new to Flink and Kafka and have some data aggregation jobs written in Scala which run in Apache Flink, the jobs consume data from Kafka perform aggregation and produce results back to Kafka.我对 Flink 和 Kafka 相当陌生,并且在 Scala 中编写了一些数据聚合作业,这些作业在 Apache Flink 中运行,这些作业使用来自 Kafka 的数据执行聚合并将结果返回给 Kafka。
I need the jobs to consume data from any new Kafka topic created while the job is running which matches a pattern.我需要这些作业来使用在作业运行时创建的与模式匹配的任何新 Kafka 主题的数据。 I got this working by setting the following properties for my consumer
我通过为我的消费者设置以下属性来完成这项工作
val properties = new Properties()
properties.setProperty(“bootstrap.servers”, “my-kafka-server”)
properties.setProperty(“group.id”, “my-group-id”)
properties.setProperty(“zookeeper.connect”, “my-zookeeper-server”)
properties.setProperty(“security.protocol”, “PLAINTEXT”)
properties.setProperty(“flink.partition-discovery.interval-millis”, “500”);
properties.setProperty(“enable.auto.commit”, “true”);
properties.setProperty(“auto.offset.reset”, “earliest”);
val consumer = new FlinkKafkaConsumer011[String](Pattern.compile(“my-topic-start-.*”), new SimpleStringSchema(), properties)
The consumer works fine and consumes data from existing topics which start with “my-topic-start-”消费者工作正常并使用以“my-topic-start-”开头的现有主题的数据
When I publish data against a new topic say for example “my-topic-start-test1” for the first time, my consumer does not recognise the topic until after 500 milliseconds after the topic was created, this is based on the properties.当我第一次发布针对新主题的数据时,例如“my-topic-start-test1”,我的消费者直到主题创建后 500 毫秒后才识别该主题,这是基于属性的。 When the consumer identifies the topic it does not read the first data record published and starts reading subsequent records so effectively I loose that first data record every time data is published against a new topic.
当消费者识别出主题时,它不会读取发布的第一条数据记录,而是开始有效地读取后续记录,每次针对新主题发布数据时,我都会丢失第一条数据记录。
Is there a setting I am missing or is it how Kafka works?是否有我遗漏的设置或者卡夫卡的工作方式? Any help would be appreciated.
任何帮助,将不胜感激。
Thanks Shravan谢谢Shravan
I think part of the issue is my producer was creating topic and publishing message in one go, so by the time consumer discovers new partition that message has already been produced.我认为部分问题是我的生产者在一个 go 中创建主题并发布消息,所以当消费者发现新分区时,该消息已经生成。
As a temporary solution I updated my producer to create the topic if it does not exists and then publish a message (make it 2 step process) and this works.作为一个临时解决方案,我更新了我的生产者以创建该主题(如果它不存在),然后发布一条消息(使其成为 2 步过程)并且这有效。
Would be nice to have a more robust consumer side solution though:)不过,如果有一个更强大的消费者端解决方案会很好:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.