簡體   English   中英

Kubernetes cronjob 錯過了計划

[英]Kubernetes cronjob missed schedule

EKS 集群中運行着大約 50 個 cronjobs。 我想找出 Cronjob 錯過計划作業的原因,檢查計划、並發策略、活動作業、startingDeadlineSeconds 似乎是一個乏味的過程。 盡管進行了所有這些檢查,但有時仍不清楚。 無法從 controller 日志中找到有用的信息。 是否有任何直接的方法可以從日志中找出錯過日程的原因?

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  creationTimestamp: "2021-03-02T20:19:23Z"
  name: <name >
  namespace: <namespace>
spec:
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      template:
        metadata:
          creationTimestamp: null
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: <key>
                    operator: In
                    values:
                    - "true"
          containers:
            image: <image-name>
            imagePullPolicy: Always
            name: solution-info
            resources:
              limits:
                cpu: 300m
                memory: 300Mi
              requests:
                cpu: 300m
                memory: 300Mi
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: OnFailure
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
          tolerations:
          - effect: NoSchedule
            key: assets
            operator: Equal
            value: "true"
  schedule: 0 */6 * * *
  startingDeadlineSeconds: 10
  successfulJobsHistoryLimit: 3
  suspend: false
status:
  lastScheduleTime: "2021-03-10T12:00:00Z"

我已經進行了一些挖掘,在這種情況下我想介紹幾點:

  1. 控制平面組件使用klog庫進行日志記錄。 當與--log-dir標志一起使用時, kube-controller-manager可以將每個級別 a 記錄到給定目錄內的單獨文件中,如果與--log-file標志一起使用,則可以將所有內容記錄到單個文件中。 請記住,它們是互斥的,並確保您檢查的是正確的日志。

  2. CronJob Controller 每 10 秒運行一次:


 // Check things every 10 second. 
 go wait.Until(jm.syncAll, 10*time.Second, stopCh) 

如果為時已晚並且錯過了時間表,它將記錄下來:


 scheduledTime := times[len(times)-1] 
 tooLate := false 
 if sj.Spec.StartingDeadlineSeconds != nil { 
    tooLate = scheduledTime.Add(time.Second * time.Duration(*sj.Spec.StartingDeadlineSeconds)).Before(now) 
 } 
 if tooLate { 
    glog.V(4).Infof("Missed starting window for %s", nameForLog) 
    recorder.Eventf(sj, v1.EventTypeWarning, "MissSchedule", "Missed scheduled time to start a job: %s", scheduledTime.Format(time.RFC1123Z)) 
    // TODO: Since we don't set LastScheduleTime when not scheduling, we are going to keep noticing 
    // the miss every cycle.  In order to avoid sending multiple events, and to avoid processing 
    // the sj again and again, we could set a Status.LastMissedTime when we notice a miss. 
    // Then, when we call getRecentUnmetScheduleTimes, we can take max(creationTimestamp, 
    // Status.LastScheduleTime, Status.LastMissedTime), and then so we won't generate 
    // and event the next time we process it, and also so the user looking at the status 
    // can see easily that there was a missed execution. 
    return 
 } 

因此,在日志中查找“錯過的開始窗口”或類似內容將提供預期的結果。

  1. 還強烈建議您了解CronJob 限制

注意:如果startingDeadlineSeconds設置為小於10 秒的值,CronJob 可能不會被調度。 這是因為 CronJob controller 每 10 秒檢查一次。

可以在鏈接的文檔中找到更多可能錯過時間表的詳細信息和原因。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM