简体繁体 English

flink可以运行多个相同的作业来实现伪动态缩放吗？

[英]Can flink run mutliple same jobs to achieve pseudo dynamic scaling?

原文 2018-12-25 13:16:23 6 1 kubernetes/ yarn/ apache-flink

We are working on how to do dynamic scaling of flink tasks. 我们正在研究如何对flink任务进行动态缩放。 The task is about to read streaming in kafka topic and do ... then sink to another kafka topic. 该任务将要阅读kafka主题中的流并做...然后沉入另一个kafka主题中。 We know that the flink job must be stopped first to modify the parallelism, which is not what we want. 我们知道，必须先停止flink作业才能修改并行度，这不是我们想要的。

Since we cant dynamic add resource to tasks without stopping flink jobs, can we duplicate the flink jobs (which consumes through same groupid from the kafka topic) to increase the performance? 由于我们不能在不停止flink作业的情况下向任务动态添加资源，因此我们可以复制flink作业（通过kafka主题通过相同的groupid进行消耗）以提高性能吗？ Besides, is it possible to use yarn or kubernetes to manage those jobs and achieve a pseudo-dynamic scaling for such a flink task(with kafka)? 此外，是否可以使用yarn或kubernetes来管理这些作业，并为flink任务（使用kafka）实现伪动态缩放？

1 个解决方案

Is there a reason why you don't want to modify parallelism by stopping the job? 您是否有理由不想通过停止工作来修改并行性？

You could do this however you would effectively be splitting your data across the various jobs. 您可以这样做，但是您可以有效地将数据拆分到各个作业中。 So not only would you be incurring the cost of now needing to understand your throughput across multiple jobs to efficiently autoscale but you would make it such that any stateful processing that is done would result in incorrect/inconsistent results. 因此，您不仅要付出代价，现在还需要了解跨多个作业的吞吐量以进行有效的自动缩放，而且这样做的代价是，所做的任何有状态处理都将导致错误/不一致的结果。