简体繁体 English

flink-群集未使用群集

[英]flink - cluster not using cluster

原文 2015-11-22 13:35:39 7 2 java/ apache-kafka/ apache-flink/ flink-streaming

Ive setup a 3 node cluster that was distributing tasks (steps? jobs?) pretty evenly until the most recent which has all been assigned to one machine. 我已经设置了一个3节点群集，该群集均匀地分配任务（步骤，作业？），直到最近的所有任务都分配给了一台计算机为止。

Topology (do we still use this term for flink?): 拓扑（我们是否仍将此术语用于flink？）：

kafka (3 topics on different feeds) -> flatmap -> union -> map

Is there something about this setup that would tell the cluster manager to put everything on one machine? 关于此设置，是否有任何内容可以告诉集群管理器将所有内容都放在一台计算机上？

Also - what are the 'not set' values in the image? 另外-图片中的“未设置”值是什么？ Some step I've missed? 我错过了一些步骤？ Or some to-be-implemented UI feature? 还是一些即将实现的UI功能？

2 个解决方案

It is actually on purpose that Flink schedules your job on a single TaskManager. 实际上，Flink故意在单个TaskManager上安排您的工作。 In order to understand it let me quickly explain Flink's resource scheduling algorithm. 为了理解它，让我快速解释Flink的资源调度算法。

First of all, in the Flink world a slot can accommodate more than one task (parallel instance of an operator). 首先，在Flink世界中，插槽可以容纳多个任务（操作员的并行实例）。 In fact, it can accommodate one parallel instance of each operator. 实际上，它可以容纳每个运算符的一个并行实例。 The reason for this is that Flink not only executes streaming jobs in a streaming fashion but also batch jobs. 这是因为Flink不仅以流方式执行流作业，而且还执行批处理作业。 With streaming fashion I mean that Flink brings all operators of your dataflow graph online so that intermediate results can be streamed directly to downstream operators where they are consumed. 用流式传输方式，我的意思是Flink使数据流图的所有运算符联机，以便中间结果可以直接流式传输到下游运算符并在其中被使用。 Per default Flink tries to combine one task of each operator in one slot. 默认情况下，Flink尝试在一个插槽中合并每个操作员的一项任务。

When Flink schedules the tasks to the different slots, then it tries to co-locate the tasks with their inputs to avoid unnecessary network communication. 当Flink将任务调度到不同的插槽时，它会尝试将任务及其输入放在同一位置，以避免不必要的网络通信。 For sources, the co-location depends on the implementation. 对于源，共置位置取决于实现方式。 For file-based sources, for example, Flink tries to assign local file input splits to the different tasks. 例如，对于基于文件的源，Flink尝试将本地文件输入拆分分配给不同的任务。

So if we apply this to your job, then we see the following. 因此，如果我们将此应用于您的工作，那么我们将看到以下内容。 You have three different sources with parallelism 1. All sources belong to the same resource sharing group, thus the single task of each operator will deployed to the same slot. 您有三个具有并行性的不同资源1。所有资源都属于同一资源共享组，因此每个操作员的单个任务将部署到同一插槽。 The initial slot is randomly chosen from the available instances (actually it depends on the order of the TaskManager registration at the JobManager ) and then filled up. 最开始的时隙被随机地从可用的情况下选择（实际上它取决于的顺序TaskManager登记在JobManager ），然后填充。 Let's say the chosen slot is on machine node1 . 假设所选的插槽位于机器node1 。

Next we have the three flat map operators which have a parallelism of 2. Here again one of the two sub-tasks of each flat map operator can be deployed to the same slot which already accommodates the three sources. 接下来，我们有三个并行度为2的平面映射运算符。这里，每个平面映射运算符的两个子任务之一可以再次部署到已容纳三个源的同一插槽中。 The second sub-task, however, has to placed in a new slot. 但是，第二个子任务必须放置在新的插槽中。 When this happens Flink tries to choose a free slot which is co-located to a slot in which one of the task's inputs is deployed (again to reduce network communication). 发生这种情况时，Flink会尝试选择一个空闲插槽，该空闲插槽与部署任务输入之一的插槽位于同一位置（再次减少网络通信）。 Since only one slot of node1 is occupied and thus 31 are still free, it will deploy the 2nd sub-task of each flatMap operator also to node1 . 由于node1仅有一个插槽被占用，因此31个插槽仍然空闲，因此它将把每个flatMap运算符的第二个子任务也部署到node1 。

The same now applies to the tumbling window reduce operation. 现在，同样适用于滚动窗口缩小操作。 Flink tries to co-locate all the tasks of the window operator with it's inputs. Flink试图将窗口运算符的所有任务与其输入共同定位。 Since all of its inputs run on node1 and node1 has enough free slots to accommodate 6 sub-tasks of the window operator, they will be scheduled to node1 . 由于其所有输入都在node1运行，并且node1具有足够的可用插槽以容纳窗口运算符的6个子任务，因此它们将被调度到node1 。 It's important to note, that 1 window task will run in the slots which contains the three sources and one task of each flatMap operator. 重要的是要注意，一个窗口任务将在包含三个源和每个flatMap运算符一个任务的插槽中运行。

I hope this explains why Flink only uses the slots of a single machine for the execution of your job. 我希望这可以解释为什么Flink仅将一台计算机的插槽用于执行工作。

The problem is that you are building a global window on an unkeyed (ungrouped) stream, so the window has to run on one machine. 问题是您要在无键（未分组）流上构建全局窗口，因此该窗口必须在一台计算机上运行。

Maybe you can also express your application logic differently so that you can group the stream. 也许您还可以用不同的方式表达应用程序逻辑，以便可以对流进行分组。

The "(not set)" part is probably an issue in Flink's DataStream API, which is not setting default operator names. “（未设置）”部分可能是Flink的DataStream API中的一个问题，该API没有设置默认的运算符名称。 Jobs implemented against the DataSet API will look like this: 针对DataSet API实现的作业将如下所示：