I'm new to Flink. All I want to do is put my protobuf POJO to kafka as byte array. So my FlinkKafkaProducer
looks like this:
FlinkKafkaProducer<String> flinkKafkaProducer = createStringProducer(outputTopic, address);
stringInputStream
.map(//here returns byte[])
.addSink(flinkKafkaProducer);
public static FlinkKafkaProducer<String> createStringProducer(String topic, String kafkaAddress) {
return new FlinkKafkaProducer<>(kafkaAddress, topic, new SimpleStringSchema());
}
And right now it works fine but my output is String. I've tryed to add TypeInformationSerializationSchema()
instead of new SimpleStringSchema()
to change output but I cant get how to adjust it correct. Cant find any tutorial. Could someone help?
So, I finally figure out how to write protobuf to kafka producer as byte array. The problem was with serialization. In case of POJO flink uses libery Kryo
for custom de/serialization. Best way to write protobuf is use ProtobufSerializer.class
. In this example I will read from kafka String message and write as byte array. Gradle dependencys:
compile (group: 'com.twitter', name: 'chill-protobuf', version: '0.7.6'){
exclude group: 'com.esotericsoftware.kryo', module: 'kryo'
}
implementation 'com.google.protobuf:protobuf-java:3.11.0'
Registration:
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
environment.getConfig().registerTypeWithKryoSerializer(MyProtobuf.class, ProtobufSerializer.class);
KafkaSerializerClass
@Data
@RequiredArgsConstructor
public class MyProtoKafkaSerializer implements KafkaSerializationSchema<MyProto> {
private final String topic;
private final byte[] key;
@Override
public ProducerRecord<byte[], byte[]> serialize(MyProto element, Long timestamp) {
return new ProducerRecord<>(topic, key, element.toByteArray());
}
}
Job
public static FlinkKafkaProducer<MyProto> createProtoProducer(String topic, String kafkaAddress) {
MyProtoKafkaSerializer myProtoKafkaSerializer = new MyProtoKafkaSerializer(topic);
Properties props = new Properties();
props.setProperty("bootstrap.servers", kafkaAddress);
props.setProperty("group.id", consumerGroup);
return new FlinkKafkaProducer<>(topic, myProtoKafkaSerializer, props, FlinkKafkaProducer.Semantic.AT_LEAST_ONCE);
}
public static FlinkKafkaConsumer<String> createProtoConsumerForTopic(String topic, String kafkaAddress, String kafkaGroup) {
Properties props = new Properties();
props.setProperty("bootstrap.servers", kafkaAddress);
props.setProperty("group.id", kafkaGroup);
return new FlinkKafkaConsumer<>(topic, new SimpleStringSchema(), props);
}
DataStream<String> stringInputStream = environment.addSource(flinkKafkaConsumer);
FlinkKafkaProducer<MyProto> flinkKafkaProducer = createProtoProducer(outputTopic, address);
stringInputStream
.map(hashtagMapFunction)
.addSink(flinkKafkaProducer);
environment.execute("My test job");
Sources :
It indeed seems tricky to find documentation in this matter. I'll assume you use Flink >= 1.9. In that case, the following should work:
private static class PojoKafkaSerializationSchema implements KafkaSerializationSchema<YourPojo> {
@Override
public void open(SerializationSchema.InitializationContext context) throws Exception {}
@Override
public ProducerRecord<byte[], byte[]> serialize(YourPojo element,@Nullable Long timestamp) {
// serialize your POJO here and return a Kafka `ProducerRecord`
return null;
}
}
// Elsewhere:
PojoKafkaSerializationSchema schema = new PojoKafkaSerializationSchema();
FlinkKafkaProducer<Integer> kafkaProducer = new FlinkKafkaProducer<>(
"test-topic",
schema,
properties,
FlinkKafkaProducer.Semantic.AT_LEAST_ONCE
);
This code is mostly inspired by this test case , but I didn't have time to actually run it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.