简体   繁体   English

Kubectl rollout restart for statefulset

[英]Kubectl rollout restart for statefulset

As per the kubectl docs , kubectl rollout restart is applicable for deployments, daemonsets and statefulsets.根据kubectl 文档kubectl rollout restart适用于部署、daemonsets 和 statefulsets。 It works as expected for deployments.它按部署的预期工作。 But for statefulsets, it restarts only one pod of the 2 pods.但是对于 statefulsets,它只重启 2 个 pod 中的一个 pod。

✗ k rollout restart statefulset alertmanager-main                       (playground-fdp/monitoring)
statefulset.apps/alertmanager-main restarted

✗ k rollout status statefulset alertmanager-main                        (playground-fdp/monitoring)
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 2 pods at revision alertmanager-main-59d7ccf598...

✗ kgp -l app=alertmanager                                               (playground-fdp/monitoring)
NAME                  READY   STATUS    RESTARTS   AGE
alertmanager-main-0   2/2     Running   0          21h
alertmanager-main-1   2/2     Running   0          20s

As you can see the pod alertmanager-main-1 has been restarted and its age is 20s.如您所见,pod alertmanager-main-1 已重新启动,其年龄为 20 秒。 Whereas the other pod in the statefulset alertmanager, ie, pod alertmanager-main-0 has not been restarted and it is age is 21h.而 statefulset alertmanager 中的另一个 pod,即 pod alertmanager-main-0 尚未重启,年龄为 21h。 Any idea how we can restart a statefulset after some configmap used by it has been updated?知道我们如何在状态集使用的一些配置映射更新后重新启动状态集吗?

[Update 1] Here is the statefulset configuration. [更新 1] 这是 statefulset 配置。 As you can see the .spec.updateStrategy.rollingUpdate.partition is not set.如您所见,未设置.spec.updateStrategy.rollingUpdate.partition

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"monitoring.coreos.com/v1","kind":"Alertmanager","metadata":{"annotations":{},"labels":{"alertmanager":"main"},"name":"main","namespace":"monitoring"},"spec":{"baseImage":"10.47.2.76:80/alm/alertmanager","nodeSelector":{"kubernetes.io/os":"linux"},"replicas":2,"securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"alertmanager-main","version":"v0.19.0"}}
  creationTimestamp: "2019-12-02T07:17:49Z"
  generation: 4
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
  ownerReferences:
  - apiVersion: monitoring.coreos.com/v1
    blockOwnerDeletion: true
    controller: true
    kind: Alertmanager
    name: main
    uid: 3e3bd062-6077-468e-ac51-909b0bce1c32
  resourceVersion: "521307"
  selfLink: /apis/apps/v1/namespaces/monitoring/statefulsets/alertmanager-main
  uid: ed4765bf-395f-4d91-8ec0-4ae23c812a42
spec:
  podManagementPolicy: Parallel
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      alertmanager: main
      app: alertmanager
  serviceName: alertmanager-operated
  template:
    metadata:
      creationTimestamp: null
      labels:
        alertmanager: main
        app: alertmanager
    spec:
      containers:
      - args:
        - --config.file=/etc/alertmanager/config/alertmanager.yaml
        - --cluster.listen-address=[$(POD_IP)]:9094
        - --storage.path=/alertmanager
        - --data.retention=120h
        - --web.listen-address=:9093
        - --web.external-url=http://10.47.0.234
        - --web.route-prefix=/
        - --cluster.peer=alertmanager-main-0.alertmanager-operated.monitoring.svc:9094
        - --cluster.peer=alertmanager-main-1.alertmanager-operated.monitoring.svc:9094
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: 10.47.2.76:80/alm/alertmanager:v0.19.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 10
          httpGet:
            path: /-/healthy
            port: web
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        name: alertmanager
        ports:
        - containerPort: 9093
          name: web
          protocol: TCP
        - containerPort: 9094
          name: mesh-tcp
          protocol: TCP
        - containerPort: 9094
          name: mesh-udp
          protocol: UDP
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /-/ready
            port: web
            scheme: HTTP
          initialDelaySeconds: 3
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 3
        resources:
          requests:
            memory: 200Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/alertmanager/config
          name: config-volume
        - mountPath: /alertmanager
          name: alertmanager-main-db
      - args:
        - -webhook-url=http://localhost:9093/-/reload
        - -volume-dir=/etc/alertmanager/config
        image: 10.47.2.76:80/alm/configmap-reload:v0.0.1
        imagePullPolicy: IfNotPresent
        name: config-reloader
        resources:
          limits:
            cpu: 100m
            memory: 25Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/alertmanager/config
          name: config-volume
          readOnly: true
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000
      serviceAccount: alertmanager-main
      serviceAccountName: alertmanager-main
      terminationGracePeriodSeconds: 120
      volumes:
      - name: config-volume
        secret:
          defaultMode: 420
          secretName: alertmanager-main
      - emptyDir: {}
        name: alertmanager-main-db
  updateStrategy:
    type: RollingUpdate
status:
  collisionCount: 0
  currentReplicas: 2
  currentRevision: alertmanager-main-59d7ccf598
  observedGeneration: 4
  readyReplicas: 2
  replicas: 2
  updateRevision: alertmanager-main-59d7ccf598
  updatedReplicas: 2

You did not provide whole scenario.你没有提供整个场景。 It might depends on Readiness Probe or Update Strategy .这可能取决于Readiness ProbeUpdate Strategy

StatefulSet restart pods from index 0 to n-1 . StatefulSet pod 从索引0 to n-1重新启动0 to n-1 Details can be found here .可在此处找到详细信息。

Reason 1*原因 1*

Statefulset have 4 update strategies . Statefulset有 4 种更新策略

  • On Delete删除时
  • Rolling Updates滚动更新
  • Partitions分区
  • Forced Rollback强制回滚

In Partition update you can find information that:Partition更新中,您可以找到以下信息:

If a partition is specified, all Pods with an ordinal that is greater than or equal to the partition will be updated when the StatefulSet's .spec.template is updated.如果指定了分区,则在更新 StatefulSet 的.spec.template时,将更新序数大于或等于该分区的所有.spec.template All Pods with an ordinal that is less than the partition will not be updated, and, even if they are deleted, they will be recreated at the previous version.所有序号小于分区的 Pod 都不会更新,即使删除了,也会在之前的版本中重新创建。 If a StatefulSet's .spec.updateStrategy.rollingUpdate.partition is greater than its .spec.replicas , updates to its .spec.template will not be propagated to its Pods.如果 StatefulSet 的.spec.updateStrategy.rollingUpdate.partition大于其.spec.replicas ,则对其.spec.replicas更新将不会传播到其.spec.template In most cases you will not need to use a partition, but they are useful if you want to stage an update, roll out a canary, or perform a phased roll out.在大多数情况下,您不需要使用分区,但如果您想要暂存更新、推出 Canary 或执行分阶段推出,它们会很有用。

So if somewhere in StatefulSet you have set updateStrategy.rollingUpdate.partition: 1 it will restart all pods with index 1 or higher.因此,如果您在StatefulSet某处设置了updateStrategy.rollingUpdate.partition: 1 ,它将重新启动所有索引为 1 或更高的 pod。

Example of partition: 3 partition: 3示例partition: 3

NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          30m
web-1   1/1     Running   0          30m
web-2   1/1     Running   0          31m
web-3   1/1     Running   0          2m45s
web-4   1/1     Running   0          3m
web-5   1/1     Running   0          3m13s

Reason 2原因2

Configuration of Readiness probe . Readiness probe配置。

If your values of initialDelaySeconds and periodSeconds are high, it might take a while before another one will be restarted.如果initialDelaySecondsperiodSeconds值很高,则可能需要一段时间才能重新启动另一个。 Details about those parameters can be found here .可以在此处找到有关这些参数的详细信息。

In below example, pod will wait 10 seconds it will be running, and readiness probe is checking this each 2 seconds.在下面的示例中,pod 将等待 10 秒才能运行,并且readiness probe每 2 秒检查一次。 Depends on values it might be cause of this behavior.取决于值,它可能是导致此行为的原因。

    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /
        port: 80
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 2
      successThreshold: 1
      timeoutSeconds: 1

Reason 3理由三

I saw that you have 2 containers in each pod.我看到每个 pod 中有 2 个容器。

NAME                  READY   STATUS    RESTARTS   AGE
alertmanager-main-0   2/2     Running   0          21h
alertmanager-main-1   2/2     Running   0          20s

As describe in docs :文档中所述

Running - The Pod has been bound to a node, and all of the Containers have been created. Running - Pod 已经绑定到一个节点,并且所有的容器都已经创建。 At least one Container is still running , or is in the process of starting or restarting .至少有一个 Container 仍在运行,或者正在启动或重新启动

It would be good to check if everything is ok with both containers (readinessProbe/livenessProbe, restarts etc.)最好检查两个containers (readinessProbe/livenessProbe、重启等)是否一切正常

You would need to delete it.你需要删除它。 Stateful set are removed following their ordinal index with the highest ordinal index first.有状态集首先按照具有最高序数索引的序数索引被删除。

Also you do not need to restart pod to re-read updated config map.您也不需要重新启动 pod 来重新读取更新的配置映射。 This is happening automatically (after some period of time).这是自动发生的(一段时间后)。

This might be related to your ownerReferences definition.这可能与您的ownerReferences定义有关。 You can try it without any owner and do the rollout again.您可以在没有任何所有者的情况下尝试,然后再次推出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM