简体   繁体   English

无法使用Flink 1.5群集提交作业

[英]Can't submit job with Flink 1.5 cluster

Trying to move from Flink 1.3.2 to 1.5 We have cluster deployed with kubernetes. 尝试从Flink 1.3.2迁移到1.5我们已经使用kubernetes部署了集群。 Everything works fine with 1.3.2 but I can not submit job with 1.5. 在1.3.2下一切正常,但是在1.5下我不能提交工作。 When I am trying to do that I just see spinner spin around infinitely, same via REST api. 当我尝试这样做时,我只看到微调框无限旋转,这与通过REST API一样。 I even can't submit wordcount example job. 我什至无法提交字数示例工作。 Seems my taskmanagers can not connect to jobmanager, I can see them in flink UI, but in logs I see 看来我的任务管理员无法连接到jobmanager,我可以在flink UI中看到它们,但是在日志中我看到了

level=WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException: connection timed out: flink-jobmanager-nonprod-2.rpds.svc.cluster.local/25.0.84.226:6123 level = WARN akka.remote.transport.netty.NettyTransport-到[null]的远程连接失败,原因是org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException:连接超时:flink-jobmanager-nonprod -2.rpds.svc.cluster.local / 25.0.84.226:6123

level=WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@flink-jobmanager-nonprod-2.rpds.svc.cluster.local:6123] has failed, address is now gated for [50] ms. level = WARN akka.remote.ReliableDeliverySupervisor-与远程系统[akka.tcp://flink@flink-jobmanager-nonprod-2.rpds.svc.cluster.local:6123]的关联失败,地址现在被选为[50] ] 女士。 Reason: [Association failed with [akka.tcp://flink@flink-jobmanager-nonprod-2.rpds.svc.cluster.local:6123]] Caused by: [No response from remote for outbound association. 原因:[关联失败,出现[akka.tcp://flink@flink-jobmanager-nonprod-2.rpds.svc.cluster.local:6123]]原因:[远程出站关联无响应。 Associate timed out after [20000 ms].] [20000 ms]之后,关联超时。

level=WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException: connection timed out: flink-jobmanager-nonprod-2.rpds.svc.cluster.local/25.0.84.226:6123 level = WARN akka.remote.transport.netty.NettyTransport-到[null]的远程连接失败,原因是org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException:连接超时:flink-jobmanager-nonprod -2.rpds.svc.cluster.local / 25.0.84.226:6123

But I can do telnet from taskmanager to jobmanager 但是我可以从Taskmanager到Jobmanager进行telnet

Moreover everything works on my local if I start flink in cluster mode (jobmanager + taskmanager). 此外,如果我以群集模式(jobmanager + taskmanager)启动flink,那么一切都将在本地工作。 In 1.5 documentation I found mode option which flip mode between flip6 and legacy (default flip6), but If I set mode: legacy I don't see my taskmanagers registered at all. 在1.5文档中,我找到了mode选项,该选项在flip6和legacy(默认的flip6)之间切换模式,但是如果设置mode:legacy,我的任务管理器根本不会注册。

Is this something specific about k8s deployment and 1.5 I need to do? 这是关于k8s部署的特定内容,我需要做1.5吗? I checked 1.5 k8s config and it looks pretty same as we have, but we using customized docker image for flink (Security, HA, checkpointing) 我检查了1.5 k8s的配置,看起来和我们的配置几乎一样,但是我们使用自定义的docker镜像进行flink(安全性,HA,检查点)

Thank you. 谢谢。

The issue with jobmanage connectivity. 作业管理连接性问题。 Jobmanager docker image cannot connect to "flink-jobmanager" (${JOB_MANAGER_RPC_ADDRESS}) address. Jobmanager泊坞窗映像无法连接到“ flink-jobmanager”($ {JOB_MANAGER_RPC_ADDRESS})地址。

Just use afilichkin/flink-k8s Docker instead of flink:latest 只需使用afilichkin / flink-k8s Docker而不是flink:latest

I've fixed it by adding new host to jobmanager docker. 我已通过将新主机添加到jobmanager泊坞窗来修复它。 You can see it in my github project 您可以在我的github项目中看到它

https://github.com/Aleksandr-Filichkin/flink-k8s/tree/master https://github.com/Aleksandr-Filichkin/flink-k8s/tree/master

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM