如何重新启动flink作业以使用添加的TaskManager

Question

I'm testing the elasticity features in Flink 1.3.0. 我正在测试Flink 1.3.0中的弹性功能。 I have a job with checkpointing enabled and fixed-delay restart policy. 我有一份启用了检查点和固定延迟重新启动策略的作业。 When I kill one of the TaskManager JVM, after a while the job correctly restarts on the remaining node. 当我杀死其中一个TaskManager JVM时，一段时间后，作业将在其余节点上正确地重新启动。 However, when I add a new node, the job is not restarted automatically to make use of it. 但是，当我添加一个新节点时，该作业不会自动重新启动以使用它。

I tried to use bin/flink stop <jobId> but it always gives me java.lang.IllegalStateException: Job with ID <jobId> is not stoppable. 我尝试使用bin/flink stop <jobId>但是它总是给我java.lang.IllegalStateException: Job with ID <jobId> is not stoppable.

How can I restart the job to make use of the additional node? 如何重新启动作业以使用其他节点？

Answer 1

Flink 1.3 does not provide dynamic rescaling, and won't automatically restart a job to take advantage of newly available resources. Flink 1.3不提供动态重新缩放，并且不会自动重新启动作业以利用新的可用资源。 To restart a job in such a scenario, you should take a savepoint, increase the parallelism, and restart the job from the savepoint. 要在这种情况下重新启动作业，您应该获取一个保存点，提高并行度，然后从该保存点重新启动作业。 You can cancel a job with a savepoint like this: 您可以使用以下保存点取消作业：

flink cancel -s [targetDirectory] <jobID>

and then restart it via 然后通过重启

flink run -s <savepointPath> ...

See the CLI docs and savepoint docs for more details on savepoints, but you can think of a savepoint as a user-triggered checkpoint. 有关保存点的更多详细信息，请参见CLI文档和保存点文档，但是您可以将保存点视为用户触发的检查点。

Apache Flink® at MediaMath: Rescaling Stateful Applications in Production is a recent blog post from data Artians with a lot of detail about how rescaling works internally. MediaMath上的ApacheFlink®：重新缩放生产中的有状态应用程序是来自数据Artians的最新博客文章，其中详细介绍了如何在内部重新缩放。

如何重新启动flink作业以使用添加的TaskManager

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-06-22 18:01:20

如何重新启动flink作业以使用添加的TaskManager

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-06-22 18:01:20

解决方案1
4 已采纳 2017-06-22 18:01:20