简体   繁体   English

Kubernetes cronjob 同时运行多个进程而不创建多个作业

[英]Kubernetes cronjob run multiple processes at the same time without creating multiple jobs

I have a Python process that I want to fire up every n minutes in a Kubernetes cronjob and read a number of messages (say 5) from a queue, and then process/convert some files and run analysis on results based on these queue messages.我有一个 Python 进程,我想在 Kubernetes cronjob 中每n分钟启动一次,并从队列中读取一些消息(比如 5 条),然后处理/转换一些文件并根据这些队列消息对结果进行分析。 If the process is still running after n minutes, I don't want to start a new process.如果n分钟后进程仍在运行,我不想启动新进程。 In total, I would like a number of these (say 3) of these to be able to run at the same time, however, there can never be more than 3 processes running at the same time.总的来说,我希望其中一些(比如 3 个)能够同时运行,但是,同时运行的进程永远不能超过 3 个。 To try and implement this, I tried the following (simplified):为了尝试实现这一点,我尝试了以下(简化):

apiVersion: batch/v1
kind: CronJob
metadata:
  name: some-job
  namespace: some-namespace
spec:
  schedule: "*/5 * * * *"
  concurrencyPolicy: "Forbid"
  jobTemplate:
    spec:
      parallelism: 3
      template:
        spec:
          containers:
          - name: job
            image: myimage:tag
            imagePullPolicy: Always
            command: ['python', 'src/run_job.py']

Now what this amounts to is a maximum of three processes running at the same time due to 'parallelism' being 3, and concurrencyPolicy being "Forbid", even if the processes go over the 5 minute mark.现在这相当于最多同时运行三个进程,因为“并行度”为 3,并且 concurrencyPolicy 为“禁止”,即使进程 go 超过 5 分钟标记也是如此。

The problem I specifically have is that one pod (eg pod 1) can take longer than the other two to finish, which means that pod 2 and 3 might finish after a minute, while pod one only finishes after 10 minutes due to processing of larger files from the queue.我特别遇到的问题是一个吊舱(例如吊舱 1)可能比其他两个吊舱需要更长的时间才能完成,这意味着吊舱 2 和 3 可能会在一分钟后完成,而吊舱 1 仅在 10 分钟后完成,因为处理更大队列中的文件。

Where I thought that parallelism: 3 would cause pod 2 and 3 to be deleted and replaced after finishing (when new cron interval hits), they are not and have to wait for pod 1 to finish before starting three new pods when the cron interval hits again.我认为parallelism: 3会导致 pod 2 和 3 在完成后被删除和替换(当新的 cron 间隔命中时),它们不是并且必须等待 pod 1 完成,然后才能在 cron 间隔命中时启动三个新的 pod再次。

When I think about it, this functionality makes sense given the specification and meaning of what a cronjob is.当我考虑它时,考虑到 cronjob 的规范和含义,这个功能是有意义的。 However, I would like to know if it would be able to have these pods/processes not be dependent on one another for restart without having to define a duplicate cronjob, all running one process.但是,我想知道是否能够让这些 pod/进程不相互依赖以重新启动,而不必定义重复的 cronjob,所有这些都运行一个进程。

Otherwise, maybe I would like to know if it's possible to easily launch more duplicate cronjobs without copying them into multiple manifests.否则,也许我想知道是否可以轻松启动更多重复的 cronjobs 而无需将它们复制到多个清单中。

Duplicate cronjobs seems to be the way to achieve what you are looking for.重复的 cronjobs 似乎是实现您正在寻找的方法。 Produce 3 duplicates with single job at a time.一次生成 3 个副本,单个作业。 You could template the job manifest and produce multiple as in the following example.您可以将作业清单模板化并生成多个,如下例所示。 The example is not in your problem context, but you can get the idea.该示例不在您的问题上下文中,但您可以理解。 http://kubernetes.io/docs/tasks/job/parallel-processing-expansion http://kubernetes.io/docs/tasks/job/parallel-processing-expansion

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM