如何在Flink Kafka Consumer中动态获取处理中的Kafka主题名称？

Question

Currently, I have one Flink Cluster which wants to consume Kafka Topic by one Pattern, By using this way, we don't need to maintain one hard code Kafka topic list. 当前，我有一个Flink集群，希望通过一种模式使用Kafka主题，通过这种方式，我们不需要维护一个硬代码Kafka主题列表。

import java.util.regex.Pattern;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
...
private static final Pattern topicPattern = Pattern.compile("(DC_TEST_([A-Z0-9_]+)");
...
FlinkKafkaConsumer010<KafkaMessage> kafkaConsumer = new FlinkKafkaConsumer010<>(
          topicPattern, deserializerClazz.newInstance(), kafkaConsumerProps);
DataStream<KafkaMessage> input = env.addSource(kafkaConsumer);

I just want to know by using the above way, How can I get to know the real Kafka topic name during the processing? 我只想通过上述方式知道，如何在处理过程中知道真实的Kafka主题名称？ Thanks. 谢谢。

--Update-- The reason why I need to know the topic information is we need this topic name as the parameter to be used in the coming Flink sink part. -更新-之所以需要了解主题信息，是因为我们需要此主题名称作为即将在Flink接收器部件中使用的参数。

Answer 1

There are two ways to do that. 有两种方法可以做到这一点。

Option 1 : 选项1 ：

You can use Kafka-clients library to access the Kafka metadata, get topic lists. 您可以使用Kafka-clients库访问Kafka元数据，获取主题列表。 Add maven dependency or equivalent. 添加Maven依赖项或等效项。

<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>2.3.0</version>
</dependency>

You can fetch topics from Kafka cluster and filter using regex as given below 您可以从Kafka群集中获取主题，并使用正则表达式进行过滤，如下所示

 private static final Pattern topicPattern = Pattern.compile("(DC_TEST_([A-Z0-9_]+)");

  Properties properties = new Properties();
  properties.put("bootstrap.servers","localhost:9092");
  properties.put("client.id","java-admin-client");
  try (AdminClient client = AdminClient.create(properties)) {
     ListTopicsOptions options = new ListTopicsOptions();
     options.listInternal(false);
      Collection<TopicListing> listing =  client.listTopics(options).listings().get();
      List<String> allTopicsList = listings.stream().map(TopicListing::name)
      .collect(Collectors.toList());
      List<String> matchedTopics = allTopicsList.stream()
                            .filter(topicPattern.asPredicate())
                            .collect(Collectors.toList());
    }catch (Exception e) {
      e.printStackTrace();
    }
}

Once you have matchedTopics list, you can pass that to FlinkKafkaConsumer. 匹配Topics列表后，可以将其传递给FlinkKafkaConsumer。

Option 2 : 选项2：

FlinkKafkaConsumer011 in Flink release 1.8 supports Topic & partition discovery dynamically based on pattern. Flink版本1.8中的FlinkKafkaConsumer011支持基于模式动态地进行主题和分区发现。 Below is the example : 下面是示例：

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
 private static final Pattern topicPattern = Pattern.compile("(DC_TEST_([A-Z0-9_]+)");
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("group.id", "test");

FlinkKafkaConsumer011<String> myConsumer = new FlinkKafkaConsumer011<>(
    topicPattern ,
    new SimpleStringSchema(),
    properties);

Link : https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/connectors/kafka.html#kafka-consumers-topic-and-partition-discovery 链接： https : //ci.apache.org/projects/flink/flink-docs-release-1.8/dev/connectors/kafka.html#kafka-consumers-topic-and-partition-discovery

In your case, option 2 suits best. 在您的情况下，选项2最合适。

Since you want to access topic metadata as part of KafkaMessage, you need to implement KafkaDeserializationSchema interface as given below : 由于您要作为KafkaMessage的一部分访问主题元数据，因此需要实现KafkaDeserializationSchema接口，如下所示：

public class CustomKafkaDeserializationSchema extends KafkaDeserializationSchema<KafkaMessage> {
    /**
     * Deserializes the byte message.
     *
     * @param messageKey the key as a byte array (null if no key has been set).
     * @param message The message, as a byte array (null if the message was empty or deleted).
     * @param partition The partition the message has originated from.
     * @param offset the offset of the message in the original source (for example the Kafka offset).
     *
     * @return The deserialized message as an object (null if the message cannot be deserialized).
     */
    @Override
    public KafkaMessage deserialize(ConsumerRecord<byte[], byte[]> record) throws IOException {
        //You can access record.key(), record.value(), record.topic(), record.partition(), record.offset() to get topic information.
         KafkaMessage kafkaMessage = new KafkaMessage();
         kafkaMessage.setTopic(record.topic());
         // Make your kafka message here and assign the values like above.
        return kafkaMessage ;
    }

    @Override
    public boolean isEndOfStream(Long nextElement) {
        return false;
    }       
}

And then call : 然后致电：

FlinkKafkaConsumer010<Tuple2<String, String>> kafkaConsumer = new FlinkKafkaConsumer010<>(
          topicPattern, new CustomKafkaDeserializationSchema, kafkaConsumerProps);

Answer 2

You can implement your own custom KafkaDeserializationSchema, like this: 您可以实现自己的自定义KafkaDeserializationSchema，如下所示：

  public class CustomKafkaDeserializationSchema implements KafkaDeserializationSchema<Tuple2<String, String>> {
    @Override
    public boolean isEndOfStream(Tuple2<String, String> nextElement) {
        return false;
    }

    @Override
    public Tuple2<String, String> deserialize(ConsumerRecord<byte[], byte[]> record) throws Exception {
        return new Tuple2<>(record.topic(), new String(record.value(), "UTF-8"));
    }

    @Override
    public TypeInformation<Tuple2<String, String>> getProducedType() {
        return new TupleTypeInfo<>(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.STRING_TYPE_INFO);
    }
  }

With the custom KafkaDeserializationSchema, you can create DataStream of which the element contains topic infos. 使用自定义的KafkaDeserializationSchema，可以创建其元素包含主题信息的DataStream。 In my demo case the element type is Tuple2<String, String> , so you can access the topic name by Tuple2#f0 . 在我的演示案例中，元素类型为Tuple2<String, String> ，因此您可以通过Tuple2#f0访问主题名称。

FlinkKafkaConsumer010<Tuple2<String, String>> kafkaConsumer = new FlinkKafkaConsumer010<>(
          topicPattern, new CustomKafkaDeserializationSchema, kafkaConsumerProps);
DataStream<Tuple2<String, String>> input = env.addSource(kafkaConsumer);

input.process(new ProcessFunction<Tuple2<String,String>, String>() {
            @Override
            public void processElement(Tuple2<String, String> value, Context ctx, Collector<String> out) throws Exception {
                String topicName = value.f0;
                // your processing logic here.
                out.collect(value.f1);
            }
        });

如何在Flink Kafka Consumer中动态获取处理中的Kafka主题名称？

问题描述

2 个解决方案

解决方案1
0 2019-07-30 08:23:08

解决方案2
0 已采纳 2019-07-30 08:31:24

如何在Flink Kafka Consumer中动态获取处理中的Kafka主题名称？

问题描述

2 个解决方案

解决方案1 0 2019-07-30 08:23:08

解决方案2 0 已采纳 2019-07-30 08:31:24

解决方案1
0 2019-07-30 08:23:08

解决方案2
0 已采纳 2019-07-30 08:31:24