简体   繁体   English

Kubernetes Python客户端-查找挂起的作业并一次安排所有Pod或如何安排挂起的作业

[英]Kubernetes Python Client - Find pending jobs and schedule all pods at a time or how to schedule a pending job

I'm using Kubernetes 1.7 and Python Client 2.0. 我正在使用Kubernetes 1.7和Python Client 2.0。 I have a hello world machine learning program MNIST in TensorFlow running under K8 cluster. 我在TensorFlow中有一个在K8集群下运行的世界机器学习程序MNIST。 It has one Worker and one Parameter Server. 它具有一台工作器和一台参数服务器。 It is deployed as a kind: Job (in the manifest). 它作为kind: Job部署kind: Job (在清单中)。 The custom scheduler, written in python, watches for pending pods using list_namespaced_pod and schedules them based on the availability of resources. 使用python编写的自定义调度程序,使用list_namespaced_pod未决的pod,并根据资源的可用性对其进行调度。 Since it's a stream of events coming in, how can I make sure that all pending pods under one job get scheduled or not? 由于这是一连串的事件,我如何确定某个作业下的所有未决豆荚是否已计划好? In other words, I don't want to schedule a job partially, either to schedule all pods of the pending jobs or none. 换句话说,我不想部分地调度作业,要么调度所有待处理作业的窗格,要么不调度。

Also, is there a way in Kubernetes to catch/find/watch all events of the same job (ie deployed under one manifest file) at a time? 另外,Kubernetes中是否可以同时捕获/查找/监视同一作业的所有事件(即,部署在一个清单文件中)? I also tried list_namespaced_event but it also reports events one after another. 我也尝试过list_namespaced_event但它也一个接一个地报告事件。 As a result, it is likely to happen that one pod of the job can be scheduled and the latter one can't. 结果,很有可能发生了一项工作可以安排而另一项不能安排的情况。 A small version of the custom scheduler is available here . 自定义调度程序的一个小版本在这里提供

my-mnist.yml file (a smaller version) my-mnist.yml文件 (较小的版本)

---

apiVersion: batch/v1
kind: Job
metadata:
  name: my-ps 
  labels:
    name: my-ps 
    jobName: my-ps-mnist_dist
  namespace: my-namespace
spec:
  template:
    metadata:
      labels:
        name: my-ps 
        jobName: my-ps-mnist_dist
        jobId: 5b2a6cd25b02821468e41571
        manifestFile: my-mnist.yml
        jobTrainingType: distributed
        jobTaskName: "my-ps"
        jobTaskIndex: "0"
        jobWorkerInstances: "1"
      namespace: my-namespace
    spec:
      nodeSelector:
        gpu: "no"
        dlts: "yes"
      containers:
        - name: my-ps
          image: "123.456.789.10:1234/myimg/5b2a6cd25b02821468e41571"
          imagePullPolicy: Always
          tty: true
          stdin: true
          env:
            - name: JOB_TASK_NAME
              value: "ps"
            - name: JOB_ID
              value: "5b2a6cd25b02821468e41571"
            - name: JOB_LD_LIBRARY_PATH
              value: "/usr/local/cuda-9.0/lib64:/usr/lib64/nvidia:/usr/local/cuda-9.0/targets/x86_64-linux/lib"
            - name: JOB_PYTHON_VERSION
              value: "3"


---

apiVersion: batch/v1
kind: Job 
metadata:
  name: my-wkr
  labels:
    name: my-wkr
    jobName: wkr0-mnist_dist
  namespace: my-namespace
spec:
  template:
    metadata:
      labels:
        name: my-wkr
        jobName: wkr0-mnist_dist
        jobId: 5b2a6cd25b02821468e41571
        manifestFile: my-mnist.yml
        jobTrainingType: distributed
        jobTaskName: "worker"
        jobTaskIndex: "0"
        jobWorkerInstances: "1" 
      namespace: my-namespace
    spec:
      nodeSelector:
        gpu: "yes"
        dlts: "yes"
      containers:
        - name: my-wkr
          image: "123.456.789.10:1234/myimg/5b2a6cd25b02821468e41571" 
          imagePullPolicy: Always
          tty: true
          stdin: true
          resources:
            limits:
              alpha.kubernetes.io/nvidia-gpu: 2
          env:
            - name: JOB_TASK_NAME
              value: "worker"
            - name: JOB_TASK_INDEX
              value: "0"
            - name: JOB_ID
              value: "5b2a6cd25b02821468e41571"
            - name: JOB_LD_LIBRARY_PATH
              value: "/usr/local/cuda-9.0/lib64:/usr/lib64/nvidia:/usr/local/cuda-9.0/targets/x86_64-linux/lib"
            - name: JOB_PYTHON_VERSION
              value: "3"

Also, is there a way in Kubernetes to catch/find/watch all events of the same job (ie deployed under one manifest file) at a time? 另外,Kubernetes中是否可以同时捕获/查找/监视同一作业的所有事件(即,部署在一个清单文件中)?

The short answer is no. 最简洁的答案是不。 All pod events, in any case, go one after another. 无论如何,所有Pod事件都一个接一个地进行。

There is one opportunity that comes into my mind: 我想到一个机会:
Because the pods that require custom scheduler can't be scheduled by any other scheduler, your custom scheduler can collect a list of pods related to the same job and schedule them one after another, then go to the list related to the next job. 由于需要自定义计划程序的容器不能由任何其他调度程序进行调度,因此您的自定义调度程序可以收集与同一作业相关的容器列表,然后依次调度它们,然后转到与下一个作业相关的列表。 This way you can ensure that resources intended to be used by pods of the first job will not be allocated for one of the pods related to another job before all pods related to the first job is scheduled to nodes. 这样,您可以确保在将与第一个作业相关的所有容器调度到节点之前,不会将第一个作业的Pod打算使用的资源分配给与另一个作​​业相关的一个容器。

There are annotations and labels in the event the scheduler receives. 调度程序收到事件时,会有注释和标签。 I didn't check results of the list_namespaces_pod or list_namespaces_event , but I think annotations and labels should be there also. 我没有检查list_namespaces_podlist_namespaces_event结果,但我认为注释和标签也应该存在。 It is possible to set the configuration of the job in the annotations, like number of pods in the job or labels for each pod in the job (eg: labels:{job_ID:100,role:master,uid:xxx}, annotations:{[job_ID:none, master:none, worker1:none worker2:none]} ). 可以在注释中设置作业的配置,例如作业中的窗格数或作业中每个窗格的labels:{job_ID:100,role:master,uid:xxx}, annotations:{[job_ID:none, master:none, worker1:none worker2:none]} (例如: labels:{job_ID:100,role:master,uid:xxx}, annotations:{[job_ID:none, master:none, worker1:none worker2:none]} )。 When scheduler sees the first pod with the annotations for a job that he doesn't have yet, he creates a new list of pods for the job ( [job_ID:100, master:xxx, worker1:none worker2:none] ). 当调度程序看到第一个带有他尚未完成的作业的注释的窗格时,他将创建该作业的窗格的新列表( [job_ID:100, master:xxx, worker1:none worker2:none] )。 When next events appear, scheduler fills this list using pod labels and schedules only lists that are filled up completely ( [job_ID:100, master:uid1, worker1:uid2: worker2:uid3] ). 当出现下一个事件时,调度程序将使用窗格标签填充此列表,并且仅调度已完全填充的列表( [job_ID:100, master:uid1, worker1:uid2: worker2:uid3] )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM