简体   繁体   English

Kubernetes 不调度 CronJob

[英]Kubernetes Not Scheduling CronJob

I'm running an instance of microk8s and attempting to get a CronJob running every 60 seconds, but it's simply not working.我正在运行一个 microk8s 实例并试图让 CronJob 每 60 秒运行一次,但它根本不起作用。 It's my understanding CronJob's shouldn't need any manual intervention to kick them off, but this system has been up for over a month and I didn't see the pod for the cron job (in any state), so I decided I'd try kicking it off manually with k create job --from=cronjob/health-status-cron health-status-cron .我的理解是 CronJob 不需要任何手动干预来启动它们,但是这个系统已经运行了一个多月,我没有看到 cron 作业的 pod(在任何状态下),所以我决定我会尝试使用k create job --from=cronjob/health-status-cron health-status-cron手动启动它。 After manually kicking it off, the job completed successfully:手动启动后,作业成功完成:

health-status-cron-2hh96                   0/1     Completed   0          17h

I was hoping Kubernetes would then start scheduling future jobs, but it didn't.我希望 Kubernetes 然后开始安排未来的工作,但它没有。 Following is my manifest (some of it is templated with Helm, but that shouldn't matter):以下是我的清单(其中一些是用 Helm 模板化的,但这没关系):

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: health-status-cron
  namespace: {{ .Values.global.namespace }}
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/release-name: {{ .Release.Name }}
    app.kubernetes.io/release-namespace: {{ .Release.Namespace }}
spec:
  schedule: "* * * * *"
  concurrencyPolicy: Replace
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: health-status-cron
            image: busybox
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - /usr/bin/curl -k http://restfulservices/api/system-health
          restartPolicy: OnFailure

Also of note, according to the following, the job hasn't been scheduled for 35 days:另外值得注意的是,根据以下内容,该工作尚未安排 35 天:

$ k -ntango get cronjobs
NAME                   SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
health-status-cron     * * * * *   False     0        35d             36d

At this point, I have absolutely no clue what I'm doing wrong or why this particular job isn't running.在这一点上,我完全不知道我做错了什么或者为什么这个特定的工作没有运行。 Any help is greatly appreciated.任何帮助是极大的赞赏。

Edit: I ended up blowing away the entire namespace and redeploying.编辑:我最终炸毁了整个命名空间并重新部署。 I still don't know the underlying cause, unfortunately, but everything seems to work now.不幸的是,我仍然不知道根本原因,但现在一切似乎都正常了。

A couple of other things you can check:您可以检查的其他几件事:

  1. Do you have any cron pods with a "failed" status?您是否有任何处于“失败”状态的 cron pod? If you do, check those pods for why.如果这样做,请检查这些 pod 以了解原因。
  2. Did it used to work and then suddenly stop?它曾经工作过然后突然停止了吗?
  3. Does the cronjob resource have anything in the events? cronjob 资源是否在事件中有任何内容? kubectl describe cronjob health-status-cron -n tango
  4. Does the code your cron runs take > 1 minute to complete?您的 cron 运行的代码是否需要超过 1 分钟才能完成? If so, your schedule is too aggressive, and you might want to loosen the schedule如果是这样,你的日程安排太激进了,你可能想放宽日程安排
  5. The cronjob controller also has some limitations you may want to check: https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations . cronjob controller 也有一些您可能需要检查的限制: https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations Specifically the concept of "missed jobs".特别是“错过的工作”的概念。 If the cronjob controller "misses" scheduling 100 or more jobs, it will "freeze" the job and not schedule it anymore.如果 cronjob controller “错过”调度 100 个或更多作业,它将“冻结”该作业并且不再安排它。 Do you scale down the cluster or similar when it is not in use?您是否在不使用集群时缩小集群或类似的规模?
  6. Do you have any custom/third-party webhooks or plugins installed in the cluster?集群中是否安装了任何自定义/第三方 webhook 或插件? These can interfere with pod creation.这些可能会干扰 pod 创建。
  7. Do you have any jobs created in the namespace?您是否在命名空间中创建了任何作业 kubectl get jobs -n tango If you find a ton of job objects, check them to see why they did not generate pods. kubectl get jobs -n tango如果您发现大量作业对象,请检查它们以了解它们没有生成 pod 的原因。

I encountered a somewhat similar issue in 2020 (writeup has a link to the issue I raised in the Kubernetes project itself): https://blenderfox.com/2020/08/07/the-snowball-effect-in-kubernetes/我在 2020 年遇到了一个有点类似的问题(文章中有一个链接指向我在 Kubernetes 项目本身中提出的问题): https://blenderfox.com/2020/08/07/the-snowball-effect-in-kubernetes/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM