[英]Kubernetes cronjob missed schedule
EKS 集群中運行着大約 50 個 cronjobs。 我想找出 Cronjob 錯過計划作業的原因,檢查計划、並發策略、活動作業、startingDeadlineSeconds 似乎是一個乏味的過程。 盡管進行了所有這些檢查,但有時仍不清楚。 無法從 controller 日志中找到有用的信息。 是否有任何直接的方法可以從日志中找出錯過日程的原因?
apiVersion: batch/v1beta1
kind: CronJob
metadata:
creationTimestamp: "2021-03-02T20:19:23Z"
name: <name >
namespace: <namespace>
spec:
concurrencyPolicy: Allow
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: <key>
operator: In
values:
- "true"
containers:
image: <image-name>
imagePullPolicy: Always
name: solution-info
resources:
limits:
cpu: 300m
memory: 300Mi
requests:
cpu: 300m
memory: 300Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: assets
operator: Equal
value: "true"
schedule: 0 */6 * * *
startingDeadlineSeconds: 10
successfulJobsHistoryLimit: 3
suspend: false
status:
lastScheduleTime: "2021-03-10T12:00:00Z"
我已經進行了一些挖掘,在這種情況下我想介紹幾點:
控制平面組件使用klog
庫進行日志記錄。 當與--log-dir
標志一起使用時, kube-controller-manager
可以將每個級別 a 記錄到給定目錄內的單獨文件中,如果與--log-file
標志一起使用,則可以將所有內容記錄到單個文件中。 請記住,它們是互斥的,並確保您檢查的是正確的日志。
CronJob Controller 每 10 秒運行一次:
// Check things every 10 second.
go wait.Until(jm.syncAll, 10*time.Second, stopCh)
如果為時已晚並且錯過了時間表,它將記錄下來:
scheduledTime := times[len(times)-1]
tooLate := false
if sj.Spec.StartingDeadlineSeconds != nil {
tooLate = scheduledTime.Add(time.Second * time.Duration(*sj.Spec.StartingDeadlineSeconds)).Before(now)
}
if tooLate {
glog.V(4).Infof("Missed starting window for %s", nameForLog)
recorder.Eventf(sj, v1.EventTypeWarning, "MissSchedule", "Missed scheduled time to start a job: %s", scheduledTime.Format(time.RFC1123Z))
// TODO: Since we don't set LastScheduleTime when not scheduling, we are going to keep noticing
// the miss every cycle. In order to avoid sending multiple events, and to avoid processing
// the sj again and again, we could set a Status.LastMissedTime when we notice a miss.
// Then, when we call getRecentUnmetScheduleTimes, we can take max(creationTimestamp,
// Status.LastScheduleTime, Status.LastMissedTime), and then so we won't generate
// and event the next time we process it, and also so the user looking at the status
// can see easily that there was a missed execution.
return
}
因此,在日志中查找“錯過的開始窗口”或類似內容將提供預期的結果。
注意:如果
startingDeadlineSeconds
設置為小於10 秒的值,CronJob 可能不會被調度。 這是因為 CronJob controller 每 10 秒檢查一次。
可以在鏈接的文檔中找到更多可能錯過時間表的詳細信息和原因。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.