简体   繁体   English

oozie 工作流抛出 Socket 错误,但在 10 分钟后提交了两次工作流

[英]oozie workflow throws Socket error but submits the workflow twice after 10 minutes

I am facing very weird issue.我面临着非常奇怪的问题。 I have workflow xml which contains like 20 fork-join nodes and each contain 4-8 actions .我有工作流 xml,其中包含 20 个 fork-join 节点,每个节点包含 4-8 个 actions 。 When I submits this workflow, It wait for like 5-6 minutes, throws当我提交此工作流程时,它等待 5-6 分钟,抛出

"Error: IO_ERROR : java.net.SocketException: Connection reset"

But actually what happens in the background is Its submits one workflow after 10 mins & another one after 12 mins.但实际上在后台发生的是它在 10 分钟后提交一个工作流,12 分钟后提交另一个工作流。 So it ends up triggering it twice.所以它最终会触发它两次。

I tried validate to my xml & it returned "OK".我尝试验证我的 xml & 它返回“OK”。 Since its not returning workflow, I am unable to do debugging.由于它没有返回工作流程,我无法进行调试。 To be honest, I am not sure where to even start the debugging with.老实说,我什至不确定从哪里开始调试。

I have similar workflow with lesser forks(6) and they all work fine.我的工作流程与较小的 fork(6) 类似,它们都可以正常工作。 But not sure why this one causes all the trouble.但不知道为什么这个会引起所有的麻烦。

The error that you stuck above looks more like from the client side.您上面的错误看起来更像是从客户端。 I think it would be a good idea to check the server logs instead.我认为检查服务器日志是个好主意。

oozie job -oozie http://localhost:11000 -info <wfid>
oozie job -oozie http://localhost:11000 -log <wfid>

It can also be possible that you might be using the invalid Oozie URL.也有可能您正在使用无效的 Oozie URL。 For instance, if your cluster is kerberized, you have to use the Oozie URL that matches with the kerberos principal.例如,如果您的集群进行了 kerberized,则您必须使用与 kerberos 主体匹配的 Oozie URL。 If you're running from kerberized environment try, Kinit with principle and keytab ( kinit user_principle -k -t key_tab ) and then use FQN along with oozie server name in command like this如果您从 kerberized 环境中运行,请尝试使用原则和密钥表( kinit user_principle -k -t key_tab )进行kinit user_principle -k -t key_tab ,然后在命令中使用 FQN 和kinit user_principle -k -t key_tab服务器名称,如下所示

oozie job -oozie http://node_name@domain:11000/oozie -config xxxx -run

Those logs did not provide any meaningful information.这些日志没有提供任何有意义的信息。 So I split my workflow files into 2 xmls.所以我将我的工作流文件分成 2 个 xml。 I called 2nd workflow from last action of first workflow .It works well without any issues.我从第一个工作流的最后一个操作中调用了第二个工作流。它运行良好,没有任何问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM