简体   繁体   English

Flink 应用程序在工作中拥有多个数据/密钥流都共享同一个 Kafka 源和接收器是否安全?

[英]Is it safe for a Flink application to have multiple data/key streams in s job all sharing the same Kafka source and sink?

在此处输入图像描述 (Goal Updated) My goal on each data stream is: (目标更新)我对每个数据 stream 的目标是:

  • filter different msgs过滤不同的消息
  • have different event time defined window session gaps定义了不同的事件时间 window session 间隙
  • consumer from topic and produce to another topic消费者从主题和生产到另一个主题

A fan-out -> fan-in like DAG.像 DAG 一样的fan-out -> fan-in

var fanoutStreamOne = new StreamComponents(/*filter, flatmap, etc*/);
var fanoutStreamTwo = new StreamComponents(/*filter, flatmap, etc*/);
var fanoutStreamThree = new StreamComponents(/*filter, flatmap, etc*/);
var fanoutStreams = Set.of(fanoutStreamOne, fanoutStreamTwo, fanoutStreamThree)
var source = new FlinkKafkaConsumer<>(...);
var sink = new FlinkKafkaProducer<>(...);

// creates streams from same source to same sink (Using union())
new streamingJob(source, sink, fanoutStreams).execute();

I am just curious if this affects recovery/checkpoints or performance of the Flink application.我只是好奇这是否会影响 Flink 应用程序的恢复/检查点或性能。

Has anyone had success with this implementation?有没有人在这个实现上取得了成功?

And should I have the watermark strategy up front before filtering?我应该在过滤之前预先设置水印策略吗?

Thank in advance!预先感谢!

Okay, the differenced time gaps are not possible, I think so.好吧,我想是不可能有不同的时间间隔的。 I tried it a year ago, with flink 1.7, and I can't do it.一年前试过,用flink 1.7,还是不行。 The watermark is global to the application.水印对于应用程序是全局的。

To the other problems, if you are using Kafka, yo can read from some topics using regex, and get the topic using the properly deserialization schema ( here ).对于其他问题,如果您使用的是 Kafka,您可以使用正则表达式从一些主题中读取,并使用正确的反序列化模式( 此处)获取主题。

To filter the messages, I think you can use the filter functions with the dide output streams:) ( here )要过滤消息,我认为您可以将过滤功能与 dide output 流一起使用:)( 此处

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM