简体   繁体   English

Flink Yarn 在任务失败时无限重启

[英]Flink Yarn infinite restart on task failure

I am running flink streaming job on AWS yarn cluster with below configuration我正在使用以下配置在 AWS yarn 集群上运行 flink 流作业

Master Node - 1, Core Node - 1, Task Nodes - 3主节点 - 1,核心节点 - 1,任务节点 - 3

And I enabled我启用了

jobmanager.execution.failover-strategy: region

As one of my task nodes are failing and trying to restart at region level (in my case at task node level) and I enabled the restart strategy as fixedDelayrestart with 5 attempts of 5 minutes delay and my checkpoints are disabled.由于我的任务节点之一失败并尝试在区域级别重新启动(在我的情况下是在任务节点级别),因此我启用了重新启动策略作为 fixedDelayrestart 5 次尝试延迟 5 分钟,并且我的检查点被禁用。

Reference Image参考图片

If you see the image it is restarting more than expected.如果您看到图像,它的重新启动比预期的要多。

Can anybody help me understand why does it is behaving like this?有人能帮我理解为什么它会这样吗?

The documentation has a section about the "Restart Pipelined Region Failover Strategy" [1].该文档有一节关于“重新启动流水线区域故障转移策略” [1]。 The bottom line is, if you have a streaming job with an operator that physically partitions the stream, such as keyBy , all tasks will end up being in the same region, and therefore all tasks will be restarted as a whole.底线是,如果您有一个流作业,其中的操作符对流进行物理分区,例如keyBy ,则所有任务最终都将位于同一区域中,因此所有任务都将作为一个整体重新启动。 For batch jobs, you need to configure the ExecutionMode [2] to be BATCH or BATCH_FORCED .对于批处理作业,您需要将ExecutionMode [2] 配置为BATCHBATCH_FORCED

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-pipelined-region-failover-strategy [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-pipelined-region-failover-strategy

[2] https://ci.apache.org/projects/flink/flink-docs-release-1.9/api/java/org/apache/flink/api/common/ExecutionMode.html [2] https://ci.apache.org/projects/flink/flink-docs-release-1.9/api/java/org/apache/flink/api/common/ExecutionMode.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM