[英]Kubectl rollout restart for statefulset
As per the kubectl docs , kubectl rollout restart
is applicable for deployments, daemonsets and statefulsets.根据kubectl 文档,
kubectl rollout restart
适用于部署、daemonsets 和 statefulsets。 It works as expected for deployments.它按部署的预期工作。 But for statefulsets, it restarts only one pod of the 2 pods.
但是对于 statefulsets,它只重启 2 个 pod 中的一个 pod。
✗ k rollout restart statefulset alertmanager-main (playground-fdp/monitoring)
statefulset.apps/alertmanager-main restarted
✗ k rollout status statefulset alertmanager-main (playground-fdp/monitoring)
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 2 pods at revision alertmanager-main-59d7ccf598...
✗ kgp -l app=alertmanager (playground-fdp/monitoring)
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 21h
alertmanager-main-1 2/2 Running 0 20s
As you can see the pod alertmanager-main-1 has been restarted and its age is 20s.如您所见,pod alertmanager-main-1 已重新启动,其年龄为 20 秒。 Whereas the other pod in the statefulset alertmanager, ie, pod alertmanager-main-0 has not been restarted and it is age is 21h.
而 statefulset alertmanager 中的另一个 pod,即 pod alertmanager-main-0 尚未重启,年龄为 21h。 Any idea how we can restart a statefulset after some configmap used by it has been updated?
知道我们如何在状态集使用的一些配置映射更新后重新启动状态集吗?
[Update 1] Here is the statefulset configuration. [更新 1] 这是 statefulset 配置。 As you can see the
.spec.updateStrategy.rollingUpdate.partition
is not set.如您所见,未设置
.spec.updateStrategy.rollingUpdate.partition
。
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"Alertmanager","metadata":{"annotations":{},"labels":{"alertmanager":"main"},"name":"main","namespace":"monitoring"},"spec":{"baseImage":"10.47.2.76:80/alm/alertmanager","nodeSelector":{"kubernetes.io/os":"linux"},"replicas":2,"securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"alertmanager-main","version":"v0.19.0"}}
creationTimestamp: "2019-12-02T07:17:49Z"
generation: 4
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
ownerReferences:
- apiVersion: monitoring.coreos.com/v1
blockOwnerDeletion: true
controller: true
kind: Alertmanager
name: main
uid: 3e3bd062-6077-468e-ac51-909b0bce1c32
resourceVersion: "521307"
selfLink: /apis/apps/v1/namespaces/monitoring/statefulsets/alertmanager-main
uid: ed4765bf-395f-4d91-8ec0-4ae23c812a42
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
alertmanager: main
app: alertmanager
serviceName: alertmanager-operated
template:
metadata:
creationTimestamp: null
labels:
alertmanager: main
app: alertmanager
spec:
containers:
- args:
- --config.file=/etc/alertmanager/config/alertmanager.yaml
- --cluster.listen-address=[$(POD_IP)]:9094
- --storage.path=/alertmanager
- --data.retention=120h
- --web.listen-address=:9093
- --web.external-url=http://10.47.0.234
- --web.route-prefix=/
- --cluster.peer=alertmanager-main-0.alertmanager-operated.monitoring.svc:9094
- --cluster.peer=alertmanager-main-1.alertmanager-operated.monitoring.svc:9094
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: 10.47.2.76:80/alm/alertmanager:v0.19.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
httpGet:
path: /-/healthy
port: web
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
name: alertmanager
ports:
- containerPort: 9093
name: web
protocol: TCP
- containerPort: 9094
name: mesh-tcp
protocol: TCP
- containerPort: 9094
name: mesh-udp
protocol: UDP
readinessProbe:
failureThreshold: 10
httpGet:
path: /-/ready
port: web
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources:
requests:
memory: 200Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager/config
name: config-volume
- mountPath: /alertmanager
name: alertmanager-main-db
- args:
- -webhook-url=http://localhost:9093/-/reload
- -volume-dir=/etc/alertmanager/config
image: 10.47.2.76:80/alm/configmap-reload:v0.0.1
imagePullPolicy: IfNotPresent
name: config-reloader
resources:
limits:
cpu: 100m
memory: 25Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager/config
name: config-volume
readOnly: true
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccount: alertmanager-main
serviceAccountName: alertmanager-main
terminationGracePeriodSeconds: 120
volumes:
- name: config-volume
secret:
defaultMode: 420
secretName: alertmanager-main
- emptyDir: {}
name: alertmanager-main-db
updateStrategy:
type: RollingUpdate
status:
collisionCount: 0
currentReplicas: 2
currentRevision: alertmanager-main-59d7ccf598
observedGeneration: 4
readyReplicas: 2
replicas: 2
updateRevision: alertmanager-main-59d7ccf598
updatedReplicas: 2
You did not provide whole scenario.你没有提供整个场景。 It might depends on Readiness Probe or
Update Strategy
.这可能取决于Readiness Probe或
Update Strategy
。
StatefulSet
restart pods from index 0 to n-1
. StatefulSet
pod 从索引0 to n-1
重新启动0 to n-1
。 Details can be found here .可在此处找到详细信息。
Reason 1*原因 1*
Statefulset
have 4 update strategies . Statefulset
有 4 种更新策略。
In Partition
update you can find information that:在
Partition
更新中,您可以找到以下信息:
If a partition is specified, all Pods with an ordinal that is greater than or equal to the partition will be updated when the StatefulSet's
.spec.template
is updated.如果指定了分区,则在更新 StatefulSet 的
.spec.template
时,将更新序数大于或等于该分区的所有.spec.template
。 All Pods with an ordinal that is less than the partition will not be updated, and, even if they are deleted, they will be recreated at the previous version.所有序号小于分区的 Pod 都不会更新,即使删除了,也会在之前的版本中重新创建。 If a StatefulSet's
.spec.updateStrategy.rollingUpdate.partition
is greater than its.spec.replicas
, updates to its.spec.template
will not be propagated to its Pods.如果 StatefulSet 的
.spec.updateStrategy.rollingUpdate.partition
大于其.spec.replicas
,则对其.spec.replicas
更新将不会传播到其.spec.template
。 In most cases you will not need to use a partition, but they are useful if you want to stage an update, roll out a canary, or perform a phased roll out.在大多数情况下,您不需要使用分区,但如果您想要暂存更新、推出 Canary 或执行分阶段推出,它们会很有用。
So if somewhere in StatefulSet
you have set updateStrategy.rollingUpdate.partition: 1
it will restart all pods with index 1 or higher.因此,如果您在
StatefulSet
某处设置了updateStrategy.rollingUpdate.partition: 1
,它将重新启动所有索引为 1 或更高的 pod。
Example of partition: 3
partition: 3
示例partition: 3
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 30m
web-1 1/1 Running 0 30m
web-2 1/1 Running 0 31m
web-3 1/1 Running 0 2m45s
web-4 1/1 Running 0 3m
web-5 1/1 Running 0 3m13s
Reason 2原因2
Configuration of Readiness probe
. Readiness probe
配置。
If your values of initialDelaySeconds
and periodSeconds
are high, it might take a while before another one will be restarted.如果
initialDelaySeconds
和periodSeconds
值很高,则可能需要一段时间才能重新启动另一个。 Details about those parameters can be found here .可以在此处找到有关这些参数的详细信息。
In below example, pod will wait 10 seconds it will be running, and readiness probe
is checking this each 2 seconds.在下面的示例中,pod 将等待 10 秒才能运行,并且
readiness probe
每 2 秒检查一次。 Depends on values it might be cause of this behavior.取决于值,它可能是导致此行为的原因。
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
Reason 3理由三
I saw that you have 2 containers in each pod.我看到每个 pod 中有 2 个容器。
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 21h
alertmanager-main-1 2/2 Running 0 20s
Running
- The Pod has been bound to a node, and all of the Containers have been created.Running
- Pod 已经绑定到一个节点,并且所有的容器都已经创建。 At least one Container is still running , or is in the process of starting or restarting .至少有一个 Container 仍在运行,或者正在启动或重新启动。
It would be good to check if everything is ok with both containers
(readinessProbe/livenessProbe, restarts etc.)最好检查两个
containers
(readinessProbe/livenessProbe、重启等)是否一切正常
You would need to delete it.你需要删除它。 Stateful set are removed following their ordinal index with the highest ordinal index first.
有状态集首先按照具有最高序数索引的序数索引被删除。
Also you do not need to restart pod to re-read updated config map.您也不需要重新启动 pod 来重新读取更新的配置映射。 This is happening automatically (after some period of time).
这是自动发生的(一段时间后)。
This might be related to your ownerReferences
definition.这可能与您的
ownerReferences
定义有关。 You can try it without any owner and do the rollout again.您可以在没有任何所有者的情况下尝试,然后再次推出。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.