简体   繁体   English

Flink:广播运营商链

[英]Flink: Broadcasted Operator chaining

Assume that I have a Datastream of events and I want to broadcast it to a (rich) map operator(map1) that is chained to another (rich) map operator(map2).假设我有一个事件数据流,并且我想将它广播到链接到另一个(丰富)地图运算符(map2)的(丰富)地图运算符(map1)。 Parallelism of the two maps is the same.两张图的平行度是一样的。 What I want is that the the output of each parallel instance of map1 go to one parallel instance of map2 (ie, no broadcasting between the two maps).我想要的是 map1 的每个并行实例的输出转到 map2 的一个并行实例(即,两个地图之间没有广播)。 Here's what I've done so far but I'm not sure if it is logically correct.这是我到目前为止所做的,但我不确定它在逻辑上是否正确。 Is it Ok?可以吗?

val trainedStream = events.broadcast.map(new Mapper1(...)).setParallelism(par)
trainedStream.startNewChain.map(new Mapper2(...)).setParallelism(par)

Followup Question: Is the SubtaskIndex (received from RuntimeContext.getIndexOfThisSubtask) of two chained subtasks/parallel instances of map1 and map2 the same?后续问题:map1 和 map2 的两个链接子任务/并行实例的 SubtaskIndex(从 RuntimeContext.getIndexOfThisSubtask 接收)是否相同? Is there a way to check this?有没有办法检查这个?

code is in Scala but the same applies for Java I guess代码在 Scala 中,但我猜这同样适用于 Java

Chaining happens automatically in Flink whenever possible.只要有可能,链接就会在 Flink 中自动发生。 So, in your example, it's enough to just use所以,在你的例子中,只要使用就足够了

val trainedStream = events.broadcast.map(new Mapper1(...)).map(new Mapper2(...))

I'd set the parallelism on the env then.我会在env上设置并行性。

Btw are you sure you want to broadcast the events?顺便说一句,您确定要广播这些事件吗? A Datastream is processed in parallel by default.默认情况下, Datastream是并行处理的。 It is very unusual to broadcast events, as they would be processed multiple times according to the parallelism.广播事件是非常不寻常的,因为它们会根据并行性被多次处理。

Followup Question: Is the SubtaskIndex (received from RuntimeContext.getIndexOfThisSubtask) of two chained subtasks/parallel instances of map1 and map2 the same?后续问题:map1 和 map2 的两个链接子任务/并行实例的 SubtaskIndex(从 RuntimeContext.getIndexOfThisSubtask 接收)是否相同? Is there a way to check this?有没有办法检查这个?

subtask index is the same for chained operators as they reside in the same task (hence they cannot even have different indices).子任务索引对于链式运算符是相同的,因为它们驻留在同一任务中(因此它们甚至不能具有不同的索引)。 You can see that chaining was successful if you have a task mapper1 -> mapper2 .如果您有任务mapper1 -> mapper2 ,您可以看到链接成功。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM