繁体   English   中英

Apache Flink 作业集群 rpc.address 绑定到 kubernetes 上的本地主机

[英]Apache Flink Job cluster rpc.address binding to localhost on kubernetes

我正在尝试在 kubernetes 环境中运行 Flink Job cluster(1.8.1)。 我使用这个 doc用我的 Job jar 创建了 docker 图像。

按照kubefiles创建作业、作业管理器和任务管理器。 问题是任务管理器无法连接到作业管理器并不断崩溃。

在调试作业管理器日志时, jobmanager.rpc.address绑定到“localhost”。

但是我已经按照这个 doc传递了 kube 文件中的参数。

我还尝试在环境变量( FLINK_ENV_JAVA_OPTS )中设置jobmanager.rpc.address

  env:
          - name: FLINK_ENV_JAVA_OPTS
            value: "-Djobmanager.rpc.address=flink-job-cluster"

作业管理器控制台日志:

Starting the job-cluster
Starting standalonejob as a console application on host flink-job-cluster-bbxrn.
2019-07-16 17:31:10,759 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - --------------------------------------------------------------------------------
2019-07-16 17:31:10,760 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Starting StandaloneJobClusterEntryPoint (Version: <unknown>, Rev:4caec0d, Date:03.04.2019 @ 13:25:54 PDT)
2019-07-16 17:31:10,760 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  OS current user: flink
2019-07-16 17:31:10,761 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Current Hadoop/Kerberos user: <no hadoop dependency found>
2019-07-16 17:31:10,761 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM: OpenJDK 64-Bit Server VM - IcedTea - 1.8/25.212-b04
2019-07-16 17:31:10,761 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Maximum heap size: 989 MiBytes
2019-07-16 17:31:10,761 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JAVA_HOME: /usr/lib/jvm/java-1.8-openjdk/jre
2019-07-16 17:31:10,761 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  No Hadoop Dependency available
2019-07-16 17:31:10,761 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM Options:
2019-07-16 17:31:10,761 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Xms1024m
2019-07-16 17:31:10,761 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Xmx1024m
2019-07-16 17:31:10,762 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Djobmanager.rpc.address=flink-job-cluster
2019-07-16 17:31:10,762 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlog4j.configuration=file:/opt/flink-1.8.1/conf/log4j-console.properties
2019-07-16 17:31:10,762 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlogback.configurationFile=file:/opt/flink-1.8.1/conf/logback-console.xml
2019-07-16 17:31:10,762 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Program Arguments:
2019-07-16 17:31:10,762 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     --configDir
2019-07-16 17:31:10,762 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     /opt/flink-1.8.1/conf
2019-07-16 17:31:10,762 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     --job-classname
2019-07-16 17:31:10,762 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     wikiedits.WikipediaAnalysis
2019-07-16 17:31:10,762 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     --host
2019-07-16 17:31:10,762 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     flink-job-cluster
2019-07-16 17:31:10,762 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Djobmanager.rpc.address=flink-job-cluster
2019-07-16 17:31:10,763 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dparallelism.default=2
2019-07-16 17:31:10,763 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dblob.server.port=6124
2019-07-16 17:31:10,763 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dqueryable-state.server.ports=6125
2019-07-16 17:31:10,763 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Classpath: /opt/flink-1.8.1/lib/log4j-1.2.17.jar:/opt/flink-1.8.1/lib/slf4j-log4j12-1.7.15.jar:/opt/flink-1.8.1/lib/wiki-edits-0.1.jar:/opt/flink-1.8.1/lib/flink-dist_2.11-1.8.1.jar:::
2019-07-16 17:31:10,763 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - --------------------------------------------------------------------------------
2019-07-16 17:31:10,764 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Registered UNIX signal handlers for [TERM, HUP, INT]
2019-07-16 17:31:10,850 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, localhost
2019-07-16 17:31:10,851 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2019-07-16 17:31:10,851 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2019-07-16 17:31:10,851 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2019-07-16 17:31:10,851 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2019-07-16 17:31:10,851 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1

以上日志显示rpc.address绑定到 localhost 而不是flink-job-cluster

我假设任务管理器的消息被 Akka rpc 删除,因为它绑定到 localhost:6123。

2019-07-16 17:31:12,546 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Request slot with profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} for job 38190f2570cd5f0a0a47f65ddf7aae1f with allocation id 97af00eae7e3dfb31a79232077ea7ee6.
2019-07-16 17:31:14,043 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@flink-job-cluster:6123/]] arriving at [akka.tcp://flink@flink-job-cluster:6123] inbound addresses are [akka.tcp://flink@localhost:6123]
2019-07-16 17:31:26,564 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@flink-job-cluster:6123/]] arriving at [akka.tcp://flink@flink-job-cluster:6123] inbound addresses are [akka.tcp://flink@localhost:6123]

不知道为什么作业管理器绑定到本地主机。

PS:任务管理器 pod 可以解析flink-job-cluster主机。 主机名解析为服务 IP 地址。

问题的根本原因是 jobmanager.rpc.address arg 值未应用。 不知何故,内联 Args没有正确设置到 flink 全局配置中。 但是作为多行列表传递的 args 工作正常。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM