[英]How to select kafka topic dynamically in apache flink kafka sink?
[英]How to get the processing kafka topic name dynamically in Flink Kafka Consumer?
當前,我有一個Flink集群,希望通過一種模式使用Kafka主題,通過這種方式,我們不需要維護一個硬代碼Kafka主題列表。
import java.util.regex.Pattern;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
...
private static final Pattern topicPattern = Pattern.compile("(DC_TEST_([A-Z0-9_]+)");
...
FlinkKafkaConsumer010<KafkaMessage> kafkaConsumer = new FlinkKafkaConsumer010<>(
topicPattern, deserializerClazz.newInstance(), kafkaConsumerProps);
DataStream<KafkaMessage> input = env.addSource(kafkaConsumer);
我只想通過上述方式知道,如何在處理過程中知道真實的Kafka主題名稱? 謝謝。
-更新-之所以需要了解主題信息,是因為我們需要此主題名稱作為即將在Flink接收器部件中使用的參數。
有兩種方法可以做到這一點。
選項1 :
您可以使用Kafka-clients庫訪問Kafka元數據,獲取主題列表。 添加Maven依賴項或等效項。
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.3.0</version>
</dependency>
您可以從Kafka群集中獲取主題,並使用正則表達式進行過濾,如下所示
private static final Pattern topicPattern = Pattern.compile("(DC_TEST_([A-Z0-9_]+)");
Properties properties = new Properties();
properties.put("bootstrap.servers","localhost:9092");
properties.put("client.id","java-admin-client");
try (AdminClient client = AdminClient.create(properties)) {
ListTopicsOptions options = new ListTopicsOptions();
options.listInternal(false);
Collection<TopicListing> listing = client.listTopics(options).listings().get();
List<String> allTopicsList = listings.stream().map(TopicListing::name)
.collect(Collectors.toList());
List<String> matchedTopics = allTopicsList.stream()
.filter(topicPattern.asPredicate())
.collect(Collectors.toList());
}catch (Exception e) {
e.printStackTrace();
}
}
匹配Topics列表后,可以將其傳遞給FlinkKafkaConsumer。
選項2:
Flink版本1.8中的FlinkKafkaConsumer011
支持基於模式動態地進行主題和分區發現。 下面是示例:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
private static final Pattern topicPattern = Pattern.compile("(DC_TEST_([A-Z0-9_]+)");
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("group.id", "test");
FlinkKafkaConsumer011<String> myConsumer = new FlinkKafkaConsumer011<>(
topicPattern ,
new SimpleStringSchema(),
properties);
鏈接: https : //ci.apache.org/projects/flink/flink-docs-release-1.8/dev/connectors/kafka.html#kafka-consumers-topic-and-partition-discovery
在您的情況下,選項2最合適。
由於您要作為KafkaMessage的一部分訪問主題元數據,因此需要實現KafkaDeserializationSchema接口,如下所示:
public class CustomKafkaDeserializationSchema extends KafkaDeserializationSchema<KafkaMessage> {
/**
* Deserializes the byte message.
*
* @param messageKey the key as a byte array (null if no key has been set).
* @param message The message, as a byte array (null if the message was empty or deleted).
* @param partition The partition the message has originated from.
* @param offset the offset of the message in the original source (for example the Kafka offset).
*
* @return The deserialized message as an object (null if the message cannot be deserialized).
*/
@Override
public KafkaMessage deserialize(ConsumerRecord<byte[], byte[]> record) throws IOException {
//You can access record.key(), record.value(), record.topic(), record.partition(), record.offset() to get topic information.
KafkaMessage kafkaMessage = new KafkaMessage();
kafkaMessage.setTopic(record.topic());
// Make your kafka message here and assign the values like above.
return kafkaMessage ;
}
@Override
public boolean isEndOfStream(Long nextElement) {
return false;
}
}
然后致電:
FlinkKafkaConsumer010<Tuple2<String, String>> kafkaConsumer = new FlinkKafkaConsumer010<>(
topicPattern, new CustomKafkaDeserializationSchema, kafkaConsumerProps);
您可以實現自己的自定義KafkaDeserializationSchema,如下所示:
public class CustomKafkaDeserializationSchema implements KafkaDeserializationSchema<Tuple2<String, String>> {
@Override
public boolean isEndOfStream(Tuple2<String, String> nextElement) {
return false;
}
@Override
public Tuple2<String, String> deserialize(ConsumerRecord<byte[], byte[]> record) throws Exception {
return new Tuple2<>(record.topic(), new String(record.value(), "UTF-8"));
}
@Override
public TypeInformation<Tuple2<String, String>> getProducedType() {
return new TupleTypeInfo<>(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.STRING_TYPE_INFO);
}
}
使用自定義的KafkaDeserializationSchema,可以創建其元素包含主題信息的DataStream。 在我的演示案例中,元素類型為Tuple2<String, String>
,因此您可以通過Tuple2#f0
訪問主題名稱。
FlinkKafkaConsumer010<Tuple2<String, String>> kafkaConsumer = new FlinkKafkaConsumer010<>(
topicPattern, new CustomKafkaDeserializationSchema, kafkaConsumerProps);
DataStream<Tuple2<String, String>> input = env.addSource(kafkaConsumer);
input.process(new ProcessFunction<Tuple2<String,String>, String>() {
@Override
public void processElement(Tuple2<String, String> value, Context ctx, Collector<String> out) throws Exception {
String topicName = value.f0;
// your processing logic here.
out.collect(value.f1);
}
});
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.