[英]kubernetes pod failed with Back-off restarting failed container
I am trying to setup prometheus logging, I am trying to deploy below yamls but pod is failed with "Back-off restarting failed container"我正在尝试设置普罗米修斯日志记录,我正在尝试在 yamls 下进行部署,但 pod 因“后退重启失败的容器”而失败
Name: prometheus-75dd748df4-wrwlr
Namespace: monitoring
Priority: 0
Node: kbs-vm-02/172.16.1.8
Start Time: Tue, 28 Apr 2020 06:13:22 +0000
Labels: app=prometheus
pod-template-hash=75dd748df4
Annotations: <none>
Status: Running
IP: 10.44.0.7
IPs:
IP: 10.44.0.7
Controlled By: ReplicaSet/prometheus-75dd748df4
Containers:
prom:
Container ID: docker://50fb273836c5522bbbe01d8db36e18688e0f673bc54066f364290f0f6854a74f
Image: quay.io/prometheus/prometheus:v2.4.3
Image ID: docker-pullable://quay.io/prometheus/prometheus@sha256:8e0e85af45fc2bcc18bd7221b8c92fe4bb180f6bd5e30aa2b226f988029c2085
Port: 9090/TCP
Host Port: 0/TCP
Args:
--config.file=/prometheus-cfg/prometheus.yml
--storage.tsdb.path=/data
--storage.tsdb.retention=$(STORAGE_LOCAL_RETENTION)
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 28 Apr 2020 06:14:08 +0000
Finished: Tue, 28 Apr 2020 06:14:08 +0000
Ready: False
Restart Count: 3
Limits:
memory: 1Gi
Requests:
cpu: 200m
memory: 500Mi
Environment Variables from:
prometheus-config-flags ConfigMap Optional: false
Environment: <none>
Mounts:
/data from storage (rw)
/prometheus-cfg from config-file (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-bt7dw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-file:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-config-file
Optional: false
storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-storage-claim
ReadOnly: false
prometheus-token-bt7dw:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-token-bt7dw
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 76s (x3 over 78s) default-scheduler running "VolumeBinding" filter plugin for pod "prometheus-75dd748df4-wrwlr": pod has unbound immediate PersistentVolumeClaims
Normal Scheduled 73s default-scheduler Successfully assigned monitoring/prometheus-75dd748df4-wrwlr to kbs-vm-02
Normal Pulled 28s (x4 over 72s) kubelet, kbs-vm-02 Container image "quay.io/prometheus/prometheus:v2.4.3" already present on machine
Normal Created 28s (x4 over 72s) kubelet, kbs-vm-02 Created container prom
Normal Started 27s (x4 over 71s) kubelet, kbs-vm-02 Started container prom
Warning BackOff 13s (x6 over 69s) kubelet, kbs-vm-02 Back-off restarting failed container
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
strategy:
type: Recreate
template:
metadata:
labels:
app: prometheus
spec:
securityContext:
fsGroup: 1000
serviceAccountName: prometheus
containers:
- image: quay.io/prometheus/prometheus:v2.4.3
name: prom
args:
- '--config.file=/prometheus-cfg/prometheus.yml'
- '--storage.tsdb.path=/data'
- '--storage.tsdb.retention=$(STORAGE_LOCAL_RETENTION)'
envFrom:
- configMapRef:
name: prometheus-config-flags
ports:
- containerPort: 9090
name: prom-port
resources:
limits:
memory: 1Gi
requests:
cpu: 200m
memory: 500Mi
volumeMounts:
- name: config-file
mountPath: /prometheus-cfg
- name: storage
mountPath: /data
volumes:
- name: config-file
configMap:
name: prometheus-config-file
- name: storage
persistentVolumeClaim:
claimName: prometheus-storage-claim
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-storage
namespace: monitoring
labels:
app: prometheus
spec:
capacity:
storage: 12Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data"
PVC Yaml data: PVC Yaml 数据:
[vidya@KBS-VM-01 7-1_prometheus]$ cat prometheus/prom-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-storage-claim
namespace: monitoring
labels:
app: prometheus
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Do you know what is the issue and how to fix it.你知道是什么问题以及如何解决它。 Please also let me know any more files need to be share,
还请让我知道需要共享更多文件,
My Guess is something problem with storage configs, seeing at events logs我的猜测是存储配置有问题,在事件日志中看到
Warning FailedScheduling 76s (x3 over 78s) default-scheduler running "VolumeBinding" filter plugin for pod "prometheus-75dd748df4-wrwlr": pod has unbound immediate PersistentVolumeClaims警告 FailedScheduling 76s (x3 over 78s) default-scheduler running "VolumeBinding" filter plugin for pod "prometheus-75dd748df4-wrwlr": pod has unbound immediate PersistentVolumeClaims
I am using local storage.我正在使用本地存储。
[vidya@KBS-VM-01 7-1_prometheus]$ kubectl describe pvc prometheus-storage-claim -n monitoring
Name: prometheus-storage-claim
Namespace: monitoring
StorageClass:
Status: Bound
Volume: prometheus-storage
Labels: app=prometheus
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 12Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: prometheus-75dd748df4-wrwlr
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 37m persistentvolume-controller no persistent volumes available for this claim and no storage class is set
[vidya@KBS-VM-01 7-1_prometheus]$ kubectl logs prometheus-75dd748df4-zlncv -n monitoring
level=info ts=2020-04-28T07:49:07.885529914Z caller=main.go:238 msg="Starting Prometheus" version="(version=2.4.3, branch=HEAD, revision=167a4b4e73a8eca8df648d2d2043e21bdb9a7449)"
level=info ts=2020-04-28T07:49:07.885635014Z caller=main.go:239 build_context="(go=go1.11.1, user=root@1e42b46043e9, date=20181004-08:42:02)"
level=info ts=2020-04-28T07:49:07.885812014Z caller=main.go:240 host_details="(Linux 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 prometheus-75dd748df4-zlncv (none))"
level=info ts=2020-04-28T07:49:07.885833214Z caller=main.go:241 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-04-28T07:49:07.885849614Z caller=main.go:242 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-04-28T07:49:07.888695413Z caller=main.go:554 msg="Starting TSDB ..."
level=info ts=2020-04-28T07:49:07.889017612Z caller=main.go:423 msg="Stopping scrape discovery manager..."
level=info ts=2020-04-28T07:49:07.889033512Z caller=main.go:437 msg="Stopping notify discovery manager..."
level=info ts=2020-04-28T07:49:07.889041112Z caller=main.go:459 msg="Stopping scrape manager..."
level=info ts=2020-04-28T07:49:07.889048812Z caller=main.go:433 msg="Notify discovery manager stopped"
level=info ts=2020-04-28T07:49:07.889071612Z caller=main.go:419 msg="Scrape discovery manager stopped"
level=info ts=2020-04-28T07:49:07.889083112Z caller=main.go:453 msg="Scrape manager stopped"
level=info ts=2020-04-28T07:49:07.889098012Z caller=manager.go:638 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-04-28T07:49:07.889109912Z caller=manager.go:644 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-04-28T07:49:07.889124912Z caller=notifier.go:512 component=notifier msg="Stopping notification manager..."
level=info ts=2020-04-28T07:49:07.889137812Z caller=main.go:608 msg="Notifier manager stopped"
level=info ts=2020-04-28T07:49:07.889169012Z caller=web.go:397 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=error ts=2020-04-28T07:49:07.889653412Z caller=main.go:617 err="opening storage failed: lock DB directory: open /data/lock: permission denied"
The problem here is pvc is not bound to the pv primarily because there is no storage class to link the pv with pvc and the capacity in pv(12Gi) and requests in pvc(10Gi) is not matching.这里的问题是 pvc 没有绑定到 pv 主要是因为没有存储 class 将 pv 与 pvc 链接,并且 pv(12Gi) 中的容量和 pvc(10Gi) 中的请求不匹配。 So at the end kubernetes could not figure out which pv the pvc should be bound to.
所以最后 kubernetes 无法确定 pvc 应该绑定到哪个 pv。
storageClassName: manual
in spec of both PV and PVC.storageClassName: manual
。 PV光伏
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-storage
namespace: monitoring
labels:
app: prometheus
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data"
PVC PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-storage-claim
namespace: monitoring
labels:
app: prometheus
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Update:更新:
Running the pod as root by adding runAsUser: 0
should solve the open /data/lock: permission denied
error通过添加
runAsUser: 0
以 root 身份运行 pod 应该可以解决open /data/lock: permission denied
错误
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.