简体   繁体   English

在 EMR 中取消 YARN 步骤

[英]CANCELing a YARN step in EMR

I have a long running YARN application running on EMR cluster.我有一个在 EMR 集群上运行的长时间运行的 YARN 应用程序。 Based on Canceling EMR Steps , the running steps can be canceled with command aws emr cancel-steps as long as Amazon EMR versions 5.28.0 and later is being used (which is the case for me), however when I issue the above against my running step it never kills the actual yarn application.基于Cancelling EMR Steps ,只要使用 Amazon EMR 版本 5.28.0 及更高版本(我就是这种情况),就可以使用命令aws emr cancel-steps取消正在运行的步骤,但是当我针对我的问题发出上述命令时运行步骤它永远不会杀死实际的纱线应用程序。 I can see the step changing it's status to Canceled in the UI, however if I ssh into the EMR and execute yarn application -list I still can swe my application alive and well:) In the logs I see我可以在 UI 中看到将其状态更改为Canceled的步骤,但是如果我将 ssh 进入 EMR 并执行yarn application -list我仍然可以正常运行我的应用程序:) 在我看到的日志中

INFO waitProcessCompletion ended with exit code 137 : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO total process run time: 344 seconds
2020-12-30T23:13:42.362Z INFO Step created jobs: 
2020-12-30T23:13:42.362Z WARN Step failed with exitCode 137 and took 344 seconds

Which, based on my understanding, means that the container did receive the SIGKILL command.根据我的理解,这意味着容器确实收到了 SIGKILL 命令。 Can someone advise why it is still not killing the application?有人可以建议为什么它仍然没有终止应用程序吗?

PS I am using the TERMINATE_PROCESS cancelation option when executing the cancel-steps command. PS 我在执行取消步骤命令时使用了TERMINATE_PROCESS取消选项。

Thank you!谢谢!

One aspect that you might want to look at is the cluster vs client mode of submitting the step.您可能想要查看的一个方面是提交步骤的集群与客户端模式。 If you are using cluster mode, you can't kill that step through AWS SDK since it has no control of the actual yarn application, for this to work you need to run your steps in the client mode.如果您使用的是集群模式,则无法通过 AWS SDK 终止该步骤,因为它无法控制实际的 yarn 应用程序,为此您需要在客户端模式下运行您的步骤。

Please also check other implications of running in cluster vs client mode, if you are doing this for development then it is fine to use client mode.还请检查在集群与客户端模式下运行的其他影响,如果您这样做是为了开发,那么使用客户端模式是可以的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM