简体   繁体   English

Pod无法相互通信

[英]Pods cannot communicate with each other

I have two jobs that will run only once. 我有两个只运行一次的工作。 One is called Master and one is called Slave . 一个叫Master ,一个叫Slave As the name implies a Master pod needs some info from the slave then queries some API online. 顾名思义,Master pod需要来自slave的一些信息,然后在线查询一些API。 A simple scheme on how the communicate can be done like this: 关于如何进行通信的简单方案,如下所示:

Slave --- port 6666 ---> Master ---- port 8888 ---> internet:www.example.com

To achieve this I created 5 yaml file: 为此,我创建了5个yaml文件:

  1. A job-master.yaml for creating a Master pod: 用于创建Master pod的job-master.yaml:
apiVersion: batch/v1
kind: Job
metadata:
  name: master-job
  labels:
    app: master-job
    role: master-job
spec:
  template:
    metadata:
      name: master
    spec:
      containers:
      - name: master
        image: registry.gitlab.com/example
        command: ["python", "run.py", "-wait"]
        ports:
        - containerPort: 6666

      imagePullSecrets:
      - name: regcred
      restartPolicy: Never

  1. A service (ClusterIP) that allows the Slave to send info to the Master node on port 6666: 一种服务(ClusterIP),允许从服务器在端口6666上向主节点发送信息:
apiVersion: v1
kind: Service
metadata:
  name: master-service
  labels:
    app: master-job
    role: master-job
spec:
  selector:
    app: master-job
    role: master-job
  ports:
    - protocol: TCP
      port: 6666
      targetPort: 6666
  1. A service(NodePort) that will allow the master to fetch info online: 允许主服务器在线获取信息的服务(NodePort):
apiVersion: v1
kind: Service
metadata:
  name: master-np-service
spec:
  type: NodePort
  selector:
    app: master-job
  ports:
    - protocol: TCP
      port: 8888
      targetPort: 8888
      nodePort: 31000
  1. A job for the Slave pod: Slave pod的工作:
apiVersion: batch/v1
kind: Job
metadata:
  name: slave-job
  labels:
    app: slave-job
spec:
  template:
    metadata:
      name: slave
    spec:
      containers:
      - name: slave
        image: registry.gitlab.com/example2
        ports:
        - containerPort: 6666
        #command: ["python", "run.py", "master-service.default.svc.cluster.local"]
        #command: ["python", "run.py", "10.106.146.155"]
        command: ["python", "run.py", "master-service"]
      imagePullSecrets:
      - name: regcred
      restartPolicy: Never
  1. And a service (ClusterIP) that allows the Slave pod to send the info to the Master pod: 以及允许Slave pod将信息发送到Master pod的服务(ClusterIP):
apiVersion: v1
kind: Service
metadata:
  name: slave-service
spec:
  selector:
    app: slave-job
  ports:
    - protocol: TCP
      port: 6666
      targetPort: 6666

But no matter what I do (as it can be seen in the job_slave.yaml file in the commented lines) they cannot communicate with each other except when I put the IP of the Master node in the command section of the Slave. 但无论我做什么(因为它可以在注释行中的job_slave.yaml文件中看到),除非我将主节点的IP放在Slave的命令部分中,否则它们无法相互通信。 Also the Master node cannot communicate with the outside world (even though I created a configMap with upstreamNameservers: | ["8.8.8.8"] Everything is running in a minikube environment. But I cannot pinpoint what my problem is. Any help is appreciated. 主节点也无法与外界通信(即使我使用upstreamNameservers: | ["8.8.8.8"]创建了一个configMap upstreamNameservers: | ["8.8.8.8"]所有内容都在minikube环境中运行。但我无法确定我的问题是什么。任何帮助都表示赞赏。

Your Job spec has two parts: a description of the Job itself, and a description of the Pods it creates. 您的工作规范包含两部分:作业本身的描述,以及它创建的Pod的描述。 (Using a Job here is a little odd and I'd probably pick a Deployment instead, but the same applies here.) Where the Service object has a selector: that matches the labels: of the Pods. (在这里使用Job有点奇怪,我可能会选择部署,但这同样适用于此。)其中Service对象有一个selector:匹配Pods的labels:

In the YAML files you show the Jobs have correct labels but the generated Pods don't. 在YAML文件中,您显示作业具有正确的标签,但生成的Pod不会。 You need to add (potentially duplicate) labels to the pod spec part: 您需要向pod规范部分添加(可能重复)标签:

apiVersion: batch/v1
kind: Job
metadata:
  name: master-job
  labels: {...}
spec:
  template:
    metadata:
      # name: will get ignored here
      labels:
        app: master-job
        role: master-job

You should be able to verify with kubectl describe service master-service . 您应该能够使用kubectl describe service master-service进行验证。 At the end of its output will be a line that says Endpoints: . 在其输出结束时将是一条表示Endpoints:的行Endpoints: If the Service selector and the Pod labels don't match this will say <none> ; 如果Service选择器和Pod标签不匹配,则会显示<none> ; if they do match you will see the Pod IP addresses. 如果匹配,您将看到Pod IP地址。

(You don't need a NodePort service unless you need to accept requests from outside the cluster; it could be the same as the service you use to accept requests from within the cluster. You don't need to include objects' types in their names. Nothing you've shown has any obvious relevance to communication out of the cluster.) (除非您需要接受来自群集外部的请求,否则不需要NodePort服务;它可能与您用于接受来自群集内的请求的服务相同。您不需要在其中包含对象的类型你所展示的任何内容都与群集之间的通信没有明显的相关性。)

Try with headless service: 尝试无头服务:

apiVersion: v1
kind: Service
metadata:
  name: master-service
  labels:
    app: master-job
    role: master-job
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: master-job
    role: master-job
  ports:
    - protocol: TCP
      port: 6666
      targetPort: 6666

and use command: ["python", "run.py", "master-service"] in your job_slave.yaml 并在你的job_slave.yaml使用command: ["python", "run.py", "master-service"]

Make sure your master job is listening on port 6666 inside your container. 确保您的主作业正在侦听容器内的端口6666。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM