[英]Flink deployment in kubernetes failed to start jobs
我按照本指南在 kubernetes 中创建了一个 flink 集群: https ://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html
作业管理器正在运行。 当一个作业提交给作业管理器时,它产生了一个任务管理器 pod,但任务管理器无法连接到作业管理器。
2020-10-29 13:22:51,069 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Connecting to ResourceManager akka.tcp://flink@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000).
2020-10-29 13:22:51,176 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [detection-engine-dev.team-anti-cheat/10.123.155.112:6123] failed with java.io.IOException: Connection reset by peer
2020-10-29 13:22:51,176 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [detection-engine-dev.team-anti-cheat/10.123.155.112:6123] failed with org.apache.flink.shaded.akka.org.jboss.netty.handler.codec.frame.TooLongFrameException: Adjusted frame length exceeds 10485760: 352518404 - discarded
2020-10-29 13:22:51,180 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@detection-engine-dev.team-anti-cheat:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@detection-engine-dev.team-anti-cheat:6123]] Caused by: [The remote system explicitly disassociated (reason unknown).]
2020-10-29 13:22:51,183 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*.
2020-10-29 13:23:01,203 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [detection-engine-dev.team-anti-cheat/10.123.155.112:6123] failed with java.io.IOException: Connection reset by peer
如果您的问题是 TM 找不到 JM,则可能您必须使用 Statefulset 类型的 pod。 所以你可以配置他们的主机名静态。 这是我来自https://github.com/felipegutierrez/explore-flink/tree/master/k8s 的示例
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: flink-taskmanager
namespace: kafka
spec:
replicas: 3
serviceName: flink-taskmanager
selector:
matchLabels:
app: flink-taskmanager # has to match .spec.template.metadata.labels
template:
metadata:
labels:
app: flink-taskmanager # has to match .spec.selector.matchLabels
spec:
hostname: flink-taskmanager
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties
- name: tpch-dbgen-data
persistentVolumeClaim:
claimName: tpch-dbgen-data-pvc
- name: tpch-dbgen-datarate
persistentVolumeClaim:
claimName: tpch-dbgen-datarate-pvc
containers:
- name: taskmanager
image: felipeogutierrez/explore-flink:1.11.2-scala_2.12
imagePullPolicy: Always # Always/IfNotPresent
env:
args: ["taskmanager"]
ports:
- containerPort: 6122
name: rpc
- containerPort: 6125
name: query-state
- containerPort: 9250
livenessProbe:
tcpSocket:
port: 6122
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf/
- name: tpch-dbgen-data
mountPath: /opt/tpch-dbgen/data
subPath: data
- mountPath: /tmp
name: tpch-dbgen-datarate
subPath: tmp
securityContext:
runAsUser: 9999 #
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.