[英]Is it safe for a Flink application to have multiple data/key streams in s job all sharing the same Kafka source and sink?
(Goal Updated) My goal on each data stream is:
(目标更新)我对每个数据 stream 的目标是:
A fan-out -> fan-in
like DAG.像 DAG 一样的
fan-out -> fan-in
。
var fanoutStreamOne = new StreamComponents(/*filter, flatmap, etc*/);
var fanoutStreamTwo = new StreamComponents(/*filter, flatmap, etc*/);
var fanoutStreamThree = new StreamComponents(/*filter, flatmap, etc*/);
var fanoutStreams = Set.of(fanoutStreamOne, fanoutStreamTwo, fanoutStreamThree)
var source = new FlinkKafkaConsumer<>(...);
var sink = new FlinkKafkaProducer<>(...);
// creates streams from same source to same sink (Using union())
new streamingJob(source, sink, fanoutStreams).execute();
I am just curious if this affects recovery/checkpoints or performance of the Flink application.我只是好奇这是否会影响 Flink 应用程序的恢复/检查点或性能。
Has anyone had success with this implementation?有没有人在这个实现上取得了成功?
And should I have the watermark strategy up front before filtering?我应该在过滤之前预先设置水印策略吗?
Thank in advance!预先感谢!
Okay, the differenced time gaps are not possible, I think so.好吧,我想是不可能有不同的时间间隔的。 I tried it a year ago, with flink 1.7, and I can't do it.
一年前试过,用flink 1.7,还是不行。 The watermark is global to the application.
水印对于应用程序是全局的。
To the other problems, if you are using Kafka, yo can read from some topics using regex, and get the topic using the properly deserialization schema ( here ).对于其他问题,如果您使用的是 Kafka,您可以使用正则表达式从一些主题中读取,并使用正确的反序列化模式( 此处)获取主题。
To filter the messages, I think you can use the filter functions with the dide output streams:) ( here )要过滤消息,我认为您可以将过滤功能与 dide output 流一起使用:)( 此处)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.