為什么K8S中的Redis一直重啟？

Question

Redis pod 瘋狂重啟。 我怎樣才能找出這種行為的原因？

我想通了，應該升級資源配額，但我不知道什么是最好的 cpu/ram 比率。 為什么沒有崩潰事件或日志？

這是豆莢：

> kubectl get pods
    redis-master-5d9cfb54f8-8pbgq                     1/1     Running     33         3d16h

這是日志：

> kubectl logs --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 07:02:12.152 # Server started, Redis version 2.8.19
[1] 08 Sep 07:02:12.153 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[1] 08 Sep 07:02:12.153 * The server is now ready to accept connections on port 6379
[1] 08 Sep 07:03:13.085 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:03:13.085 * Background saving started by pid 8
[8] 08 Sep 07:03:13.101 * DB saved on disk
[8] 08 Sep 07:03:13.101 * RDB: 0 MB of memory used by copy-on-write
[1] 08 Sep 07:03:13.185 * Background saving terminated with success
[1] 08 Sep 07:04:14.018 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:04:14.018 * Background saving started by pid 9
...
[93] 08 Sep 08:38:30.160 * DB saved on disk
[93] 08 Sep 08:38:30.164 * RDB: 2 MB of memory used by copy-on-write
[1] 08 Sep 08:38:30.259 * Background saving terminated with success
[1] 08 Sep 08:39:31.072 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 08:39:31.074 * Background saving started by pid 94

這是同一 pod 的先前日志。

> kubectl logs --previous --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 09:41:46.057 * Background saving terminated with success
[1] 08 Sep 09:42:47.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:42:47.076 * Background saving started by pid 140
[140] 08 Sep 09:43:14.398 * DB saved on disk
[140] 08 Sep 09:43:14.457 * RDB: 1 MB of memory used by copy-on-write
[1] 08 Sep 09:43:14.556 * Background saving terminated with success
[1] 08 Sep 09:44:15.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:44:15.077 * Background saving started by pid 141
[1 | signal handler] (1599558267) Received SIGTERM scheduling shutdown...
[1] 08 Sep 09:44:28.052 # User requested shutdown...
[1] 08 Sep 09:44:28.052 # There is a child saving an .rdb. Killing it!
[1] 08 Sep 09:44:28.052 * Saving the final RDB snapshot before exiting.
[1] 08 Sep 09:44:49.592 * DB saved on disk
[1] 08 Sep 09:44:49.592 # Redis is now ready to exit, bye bye...

這是吊艙的描述。 如您所見，限制是 100Mi，但我看不到閾值，在閾值之后 pod 將重新啟動。

> kubectl describe pod redis-master-5d9cfb54f8-8pbgq
Name:           redis-master-5d9cfb54f8-8pbgq
Namespace:      cryptoman
Priority:       0
Node:           gke-my-cluster-default-pool-818613a8-smmc/10.172.0.28
Start Time:     Fri, 04 Sep 2020 18:52:17 +0300
Labels:         app=redis
                pod-template-hash=5d9cfb54f8
                role=master
                tier=backend
Annotations:    <none>
Status:         Running
IP:             10.36.2.124
IPs:            <none>
Controlled By:  ReplicaSet/redis-master-5d9cfb54f8
Containers:
  master:
    Container ID:   docker://3479276666a41df502f1f9eb9bb2ff9cfa592f08a33e656e44179042b6233c6f
    Image:          k8s.gcr.io/redis:e2e
    Image ID:       docker-pullable://k8s.gcr.io/redis@sha256:f066bcf26497fbc55b9bf0769cb13a35c0afa2aa42e737cc46b7fb04b23a2f25
    Port:           6379/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 09 Sep 2020 10:27:56 +0300
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    0
      Started:      Wed, 09 Sep 2020 07:34:18 +0300
      Finished:     Wed, 09 Sep 2020 10:27:55 +0300
    Ready:          True
    Restart Count:  42
    Limits:
      cpu:     100m
      memory:  250Mi
    Requests:
      cpu:        100m
      memory:     250Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5tds9 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-5tds9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-5tds9
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason          Age                   From                                                Message
  ----    ------          ----                  ----                                                -------
  Normal  SandboxChanged  52m (x42 over 4d13h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Pod sandbox changed, it will be killed and re-created.
  Normal  Killing         52m (x42 over 4d13h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Stopping container master
  Normal  Created         52m (x43 over 4d16h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Created container master
  Normal  Started         52m (x43 over 4d16h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Started container master
  Normal  Pulled          52m (x42 over 4d13h)  kubelet, gke-my-cluster-default-pool-818613a8-smmc  Container image "k8s.gcr.io/redis:e2e" already present on machine

Answer 1

這是它重新啟動后的限制。 CPU 只是節流，內存 OOM。

    Limits:
      cpu:     100m
      memory:  250Mi

Reason: OOMKilled

刪除請求和限制
運行 pod，確保它不會重啟
如果您已經有 prometheus，請運行VPA Recommender以檢查它需要多少資源。 或者只是使用任何監控堆棧： GKE Prometheus 、 prometheus-operator 、 DataDog等來檢查實際資源消耗並相應地調整限制。

Answer 2

Max的回答非常完整。 但是如果您沒有安裝 Prometheus 或不想安裝，還有另一種簡單的方法可以檢查在集群中安裝指標服務器項目的實際資源消耗。 安裝后，您可以使用kubectl top node來檢查 CPU 和內存使用情況以檢查kubectl top node上的消耗，並使用kubectl top pod來檢查kubectl top pod上的消耗。 我使用它，非常有用。

或者您可以只增加 CPU 和內存限制，但您將無法確保容器需要多少資源。 基本上會浪費資源。

Answer 3

主要問題是你沒有限制redis申請。 所以 redis 只是增加了 memory，當它達到 Pod limits.memory 250Mb 時，它被 OOM 殺死，並重新啟動它。 然后，如果你刪除limits.memory ， redis 將繼續吃掉 memory 直到節點沒有足夠的 memory 來運行其他進程並且 K8s 殺死它並將其標記為“驅逐”。

因此，在 redis 應用程序中配置 memory 以限制 redis 在redis.conf文件中使用 memory 並根據您的需要設置 LRU 或 LFU 策略以刪除一些密鑰（ https ）//redis :

maxmemory 256mb
maxmemory-policy allkeys-lfu

並將 Pod 的 memory 限制在 redis maxmemory的兩倍左右，以便為 redis 中保存的進程和對象的 rest 提供一些余量：

resources:
  limits:
    cpu:     100m
    memory:  512Mi

Answer 4

現在豆莢被驅逐了。 我能找出原因嗎？

NAME                                              READY   STATUS             RESTARTS   AGE
redis-master-7d97765bbb-7kjwn                     0/1     Evicted            0          38h
redis-master-7d97765bbb-kmc9g                     1/1     Running            0          30m
redis-master-7d97765bbb-sf2ss                     0/1     Evicted            0          30m

為什么K8S中的Redis一直重啟？

問題描述

4 個解決方案

解決方案1
4 2020-09-10 12:34:03

解決方案2
2 2020-09-10 12:58:14

解決方案3
1 2022-02-17 09:26:15

解決方案4
0 2020-09-12 06:17:31

為什么K8S中的Redis一直重啟？

問題描述

4 個解決方案

解決方案1 4 2020-09-10 12:34:03

解決方案2 2 2020-09-10 12:58:14

解決方案3 1 2022-02-17 09:26:15

解決方案4 0 2020-09-12 06:17:31

解決方案1
4 2020-09-10 12:34:03

解決方案2
2 2020-09-10 12:58:14

解決方案3
1 2022-02-17 09:26:15

解決方案4
0 2020-09-12 06:17:31