Subject:Stopping Kubelet service in worker node (or) Node hanged due to memory usage is making MYSQL not to terminate properly in Kubernetes worker node.
Storage used: rook-ceph
Scenario 1: Stopping Kubelet service in worker node where the MYSQL POD was running.
Intial status:
root@master01:~/apps/mysql# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 Ready master 9d v1.18.5
master02 Ready master 9d v1.18.5
master03 Ready master 9d v1.18.5
worker01 Ready <none> 9d v1.18.5
worker02 Ready <none> 9d v1.18.5
worker03 Ready <none> 9d v1.18.5
worker04 Ready <none> 9d v1.18.5
root@master01:~/apps/mysql# kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql-747d4cd75c-zk7mr 1/1 Running 0 16s 10.0.5.62 worker01 <none> <none>
root@master01:~/apps/mysql# kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
mysql 1/1 1 1 2m2s
Test: Bring the Kubelet service down using " #systemctl stop kubelet "
root@master01:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 Ready master 9d v1.18.5
master02 Ready master 9d v1.18.5
master03 Ready master 9d v1.18.5
worker01 NotReady <none> 9d v1.18.5
worker02 Ready <none> 9d v1.18.5
worker03 Ready <none> 9d v1.18.5
worker04 Ready <none> 9d v1.18.5
Expectation: The POD to get rescheduled to worker03 as per the node affinity configuation.
Result: The POD is rescheduled in worker03 successfully but the since the old container inside mysql pod is not terminated properly the new pod eventhough it is running was showing errir logs.
root@worker01:~# docker ps | grep mysql
2d063f8d04e4 6e17b5012353 "/usr/local/bin/dock…" 9 minutes ago Up 9 minutes k8s_mysql_mysql-747d4cd75c-zk7mr_251d7d3d-1201-4757-993d-a3c7d65f87b9_0
b6d2cab4ba2b k8s.gcr.io/pause:3.2 "/pause" 9 minutes ago Up 9 minutes k8s_POD_mysql-747d4cd75c-zk7mr_251d7d3d-1201-4757-993d-a3c7d65f87b9_0
root@master01:~/apps/mysql# kubectl get po -o wide --watch
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql-747d4cd75c-hznns 0/1 ContainerCreating 0 3s <none> worker03 <none> <none>
mysql-747d4cd75c-zk7mr 1/1 Terminating 0 4m22s 10.0.5.62 worker01 <none> <none>
mysql-747d4cd75c-hznns 0/1 ContainerCreating 0 9s <none> worker03 <none> <none>
mysql-747d4cd75c-hznns 1/1 Running 0 10s 10.0.19.93 worker03 <none> <none>
root@master01:~/apps/mysql# kubectl logs -f mysql-747d4cd75c-hznns
2020-07-16 09:35:33+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.29-1debian10 started.
2020-07-16 09:35:34+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2020-07-16 09:35:34+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.29-1debian10 started.
2020-07-16T09:35:35.271995Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2020-07-16T09:35:35.307021Z 0 [Note] mysqld (mysqld 5.7.29) starting as process 1 ...
2020-07-16T09:35:35.464486Z 0 [Note] InnoDB: PUNCH HOLE support available
2020-07-16T09:35:35.464514Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-07-16T09:35:35.464521Z 0 [Note] InnoDB: Uses event mutexes
2020-07-16T09:35:35.464526Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2020-07-16T09:35:35.464531Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-07-16T09:35:35.464535Z 0 [Note] InnoDB: Using Linux native AIO
2020-07-16T09:35:35.464897Z 0 [Note] InnoDB: Number of pools: 1
2020-07-16T09:35:35.465055Z 0 [Note] InnoDB: Using CPU crc32 instructions
2020-07-16T09:35:35.466999Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2020-07-16T09:35:35.476381Z 0 [Note] InnoDB: Completed initialization of buffer pool
2020-07-16T09:35:35.479090Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-07-16T09:35:35.511655Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2020-07-16T09:35:35.511714Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2020-07-16T09:35:35.511723Z 0 [Note] InnoDB: Retrying to lock the first data file
2020-07-16T09:35:36.514159Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2020-07-16T09:35:36.514213Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2020-07-16T09:35:37.518876Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2020-07-16T09:35:37.518916Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2020-07-16T09:35:38.523434Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2020-07-16T09:35:38.523486Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2020-07-16T09:35:39.526138Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2020-07-16T09:35:39.526191Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2020-07-16T09:35:40.530406Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
Workaround tried:
When the Kubelet service came back again on worker01 the mysql which is terminated improperly was removed and the new pod in worker03 was able to access that file.
During memory full on worker01 node, the issue is resolved only after the memory is recovered back.
The reason pods are in terminating state is because kubelet which is supposed to gracefully terminate a pod is stopped.
Afterstopping kubelet
service on a worker node you should delete that worker node from API Server to clean up terminating pods which were running in that node. This should make the pods schedulable to other available worker nodes.
kubectl delete node nodename
Once kubelet
is running again on that worker node it will automatically register that node with API Server.
As a best practice pods should use taint based eviction so that pod gets evicted based on tolerationSeconds
and conditions such as
node.kubernetes.io/not-ready
: Node is not ready. This corresponds to the NodeCondition Ready being "False"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.