简体   繁体   中英

flink - cluster not using cluster

Ive setup a 3 node cluster that was distributing tasks (steps? jobs?) pretty evenly until the most recent which has all been assigned to one machine.

Topology (do we still use this term for flink?):

kafka (3 topics on different feeds) -> flatmap -> union -> map

在此处输入图片说明

Is there something about this setup that would tell the cluster manager to put everything on one machine?

Also - what are the 'not set' values in the image? Some step I've missed? Or some to-be-implemented UI feature?

It is actually on purpose that Flink schedules your job on a single TaskManager. In order to understand it let me quickly explain Flink's resource scheduling algorithm.

First of all, in the Flink world a slot can accommodate more than one task (parallel instance of an operator). In fact, it can accommodate one parallel instance of each operator. The reason for this is that Flink not only executes streaming jobs in a streaming fashion but also batch jobs. With streaming fashion I mean that Flink brings all operators of your dataflow graph online so that intermediate results can be streamed directly to downstream operators where they are consumed. Per default Flink tries to combine one task of each operator in one slot.

When Flink schedules the tasks to the different slots, then it tries to co-locate the tasks with their inputs to avoid unnecessary network communication. For sources, the co-location depends on the implementation. For file-based sources, for example, Flink tries to assign local file input splits to the different tasks.

So if we apply this to your job, then we see the following. You have three different sources with parallelism 1. All sources belong to the same resource sharing group, thus the single task of each operator will deployed to the same slot. The initial slot is randomly chosen from the available instances (actually it depends on the order of the TaskManager registration at the JobManager ) and then filled up. Let's say the chosen slot is on machine node1 .

Next we have the three flat map operators which have a parallelism of 2. Here again one of the two sub-tasks of each flat map operator can be deployed to the same slot which already accommodates the three sources. The second sub-task, however, has to placed in a new slot. When this happens Flink tries to choose a free slot which is co-located to a slot in which one of the task's inputs is deployed (again to reduce network communication). Since only one slot of node1 is occupied and thus 31 are still free, it will deploy the 2nd sub-task of each flatMap operator also to node1 .

The same now applies to the tumbling window reduce operation. Flink tries to co-locate all the tasks of the window operator with it's inputs. Since all of its inputs run on node1 and node1 has enough free slots to accommodate 6 sub-tasks of the window operator, they will be scheduled to node1 . It's important to note, that 1 window task will run in the slots which contains the three sources and one task of each flatMap operator.

I hope this explains why Flink only uses the slots of a single machine for the execution of your job.

The problem is that you are building a global window on an unkeyed (ungrouped) stream, so the window has to run on one machine.

Maybe you can also express your application logic differently so that you can group the stream.

The "(not set)" part is probably an issue in Flink's DataStream API, which is not setting default operator names. Jobs implemented against the DataSet API will look like this: 在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM