简体繁体 English

合并多个 flink 作业的 output 并返回单个 output

[英]Merge output of multiple flink jobs and return single output

原文 2022-09-20 06:21:45 0 1 scala/ apache-flink/ flink-streaming

I have multiple flink jobs which has the same source of input kafka topic and the output format is also same.我有多个 flink 作业，它们具有相同的输入 kafka 主题来源，并且 output 格式也相同。

Source -> flink job 1 -> output来源 -> flink 作业 1 -> output
Source -> flink job 2 -> output来源 -> flink 作业 2 -> output
Source -> flink job 3 -> output来源 -> flink 作业 3 -> output
Source -> flink job 4 -> output来源 -> flink 作业 4 -> output
. .
. .
. .
Source -> flink job n -> output来源 -> flink 作业 n -> output

output format is like Object(pk: String, variable1: String, variable2: Boolean) output 格式类似于Object(pk: String, variable1: String, variable2: Boolean)

I want to consume all the output and make the combined output let's say json of output array我想消耗所有 output 并组合 output 让我们说 Z78E6221F6393D14CE6DZ 数组的 json

Final required output (pk: String, variable1: List[String], variable2: List[Boolean])最终需要 output (pk: String, variable1: List[String], variable2: List[Boolean])

PS - Some flink jobs might not return output for input as per implemented flink jobs logic and I am using scala as a language PS - 根据实现的 flink 作业逻辑，某些 flink 作业可能不会返回 output 用于输入，我使用 scala 作为语言

1 个解决方案

I managed to solve this by creating one more flink job which act as master job.我设法通过创建另一个作为主作业的 flink 作业来解决这个问题。 Input to this job is output of other N jobs.该作业的输入是其他 N 个作业的 output。 Since, those jobs were having filter(condition) I added one more datastream with filter(!condition) to make sure every job returns the output.因为，这些作业具有filter(condition) ，所以我添加了一个带有filter(!condition)的数据流，以确保每个作业都返回 output。 Also, added one datastream in master job which maintains the total job count and connected it with master job datastream.此外，在主作业中添加了一个数据流，用于维护总作业数并将其与主作业数据流connected 。 Representation of the same is in the following diagram.相同的表示在下图中。 Flow of solution溶液流动