Not sure if such if there was such a question, so pardon me if I couldn't find such.
I have a cluster based on 3 nodes, my application consists of a frontend and a backend with each running 2 replicas:
node1
node2
node1
node2
FE
pods are served behind frontend-service
BE
pods are service behind be-service
When I shutdown node-2
, the application stopped and in my UI I could see application errors.
I've checked the logs and found out that my application attempted to reach the service type of the backend pods and it failed to respond since be2
wasn't running, the scheduler is yet to terminate the existing one.
Only when the node was terminated and removed from the cluster, the pods were rescheduled to the 3rd node and the application was back online.
I know a service mesh can help by removing the pods that aren't responding from the traffic, however, I don't want to implement it yet, and trying to understand what is the best solution to route the traffic to the healthy pods in a fast and easy way, 5 minutes of downtime is a lot of time.
Here's my be
deployment spec:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: backend
name: backend
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: backend
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: backend
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-Application
operator: In
values:
- "true"
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: kubernetes.io/hostname
containers:
- env:
- name: SSL_ENABLED
value: "false"
image: quay.io/something:latest
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /liveness
port: 16006
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 10
name: backend
ports:
- containerPort: 16006
protocol: TCP
- containerPort: 8457
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readiness
port: 16006
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 1500m
memory: 8500Mi
requests:
cpu: 6m
memory: 120Mi
dnsPolicy: ClusterFirst
Here's my backend service:
apiVersion: v1
kind: Service
metadata:
labels:
app: identity
name: backend
namespace: default
spec:
clusterIP: 10.233.34.115
ports:
- name: tcp
port: 16006
protocol: TCP
targetPort: 16006
- name: internal-http-rpc
port: 8457
protocol: TCP
targetPort: 8457
selector:
app: backend
sessionAffinity: None
type: ClusterIP
This is a community wiki answer. Feel free to expand it.
As already mentioned by @TomerLeibovich the main issue here was due to the Probes Configuration :
Probes have a number of fields that you can use to more precisely control the behavior of liveness and readiness checks:
initialDelaySeconds
: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.
periodSeconds
: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
timeoutSeconds
: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
successThreshold
: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness and startup Probes. Minimum value is 1.
failureThreshold
: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready. Defaults to 3. Minimum value is 1.
Plus the proper Pod eviction configuration :
The
kubelet
needs to preserve node stability when available compute resources are low. This is especially important when dealing with incompressible compute resources, such as memory or disk space. If such resources are exhausted, nodes become unstable.
Changing the threshold to 1 instead of 3 and reducing the pod-eviction solved the issue as the Pod is now being evicted sooner.
EDIT:
The other possible solution in this scenario is to label other nodes with the app backend to make sure that each backend/pod was deployed on different nodes. In your current situation one pod deployed on the node was removed from the endpoint and the application became unresponsive.
Also, the workaround for triggering pod eviction from the unhealthy node is to add tolerations to
deployment.spec. template.spec: tolerations: - key: "node.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 60
instead of using the default value: tolerationSeconds: 300
.
You can find more information in this documentation .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.