简体   繁体   中英

Stopping Kubelet service in worker node (or) Node hanged due to memory usage is making MYSQL not to terminate properly in Kubernetes worker node

Subject:Stopping Kubelet service in worker node (or) Node hanged due to memory usage is making MYSQL not to terminate properly in Kubernetes worker node.

Storage used: rook-ceph

Scenario 1: Stopping Kubelet service in worker node where the MYSQL POD was running.

Intial status:

root@master01:~/apps/mysql# kubectl get nodes
NAME       STATUS   ROLES    AGE   VERSION
master01   Ready    master   9d    v1.18.5
master02   Ready    master   9d    v1.18.5
master03   Ready    master   9d    v1.18.5
worker01   Ready    <none>   9d    v1.18.5
worker02   Ready    <none>   9d    v1.18.5
worker03   Ready    <none>   9d    v1.18.5
worker04   Ready    <none>   9d    v1.18.5

root@master01:~/apps/mysql# kubectl get po -o wide 
NAME                     READY   STATUS    RESTARTS   AGE   IP          NODE       NOMINATED NODE   READINESS GATES
mysql-747d4cd75c-zk7mr   1/1     Running   0          16s   10.0.5.62   worker01   <none>           <none>

root@master01:~/apps/mysql# kubectl get deployment
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
mysql   1/1     1            1           2m2s

Test: Bring the Kubelet service down using " #systemctl stop kubelet "

root@master01:~# kubectl get nodes 
NAME       STATUS     ROLES    AGE   VERSION
master01   Ready      master   9d    v1.18.5
master02   Ready      master   9d    v1.18.5
master03   Ready      master   9d    v1.18.5
worker01   NotReady   <none>   9d    v1.18.5
worker02   Ready      <none>   9d    v1.18.5
worker03   Ready      <none>   9d    v1.18.5
worker04   Ready      <none>   9d    v1.18.5

Expectation: The POD to get rescheduled to worker03 as per the node affinity configuation.

Result: The POD is rescheduled in worker03 successfully but the since the old container inside mysql pod is not terminated properly the new pod eventhough it is running was showing errir logs.

root@worker01:~# docker ps | grep mysql
2d063f8d04e4        6e17b5012353           "/usr/local/bin/dock…"   9 minutes ago       Up 9 minutes                            k8s_mysql_mysql-747d4cd75c-zk7mr_251d7d3d-1201-4757-993d-a3c7d65f87b9_0
b6d2cab4ba2b        k8s.gcr.io/pause:3.2   "/pause"                 9 minutes ago       Up 9 minutes                            k8s_POD_mysql-747d4cd75c-zk7mr_251d7d3d-1201-4757-993d-a3c7d65f87b9_0

root@master01:~/apps/mysql# kubectl get po -o wide --watch
NAME                     READY   STATUS              RESTARTS   AGE     IP          NODE       NOMINATED NODE   READINESS GATES
mysql-747d4cd75c-hznns   0/1     ContainerCreating   0          3s      <none>      worker03   <none>           <none>
mysql-747d4cd75c-zk7mr   1/1     Terminating         0          4m22s   10.0.5.62   worker01   <none>           <none>
mysql-747d4cd75c-hznns   0/1     ContainerCreating   0          9s      <none>      worker03   <none>           <none>
mysql-747d4cd75c-hznns   1/1     Running             0          10s     10.0.19.93   worker03   <none>           <none>

root@master01:~/apps/mysql# kubectl logs -f mysql-747d4cd75c-hznns 
2020-07-16 09:35:33+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.29-1debian10 started.
2020-07-16 09:35:34+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2020-07-16 09:35:34+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.29-1debian10 started.
2020-07-16T09:35:35.271995Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2020-07-16T09:35:35.307021Z 0 [Note] mysqld (mysqld 5.7.29) starting as process 1 ...
2020-07-16T09:35:35.464486Z 0 [Note] InnoDB: PUNCH HOLE support available
2020-07-16T09:35:35.464514Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-07-16T09:35:35.464521Z 0 [Note] InnoDB: Uses event mutexes
2020-07-16T09:35:35.464526Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2020-07-16T09:35:35.464531Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-07-16T09:35:35.464535Z 0 [Note] InnoDB: Using Linux native AIO
2020-07-16T09:35:35.464897Z 0 [Note] InnoDB: Number of pools: 1
2020-07-16T09:35:35.465055Z 0 [Note] InnoDB: Using CPU crc32 instructions
2020-07-16T09:35:35.466999Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2020-07-16T09:35:35.476381Z 0 [Note] InnoDB: Completed initialization of buffer pool
2020-07-16T09:35:35.479090Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-07-16T09:35:35.511655Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2020-07-16T09:35:35.511714Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2020-07-16T09:35:35.511723Z 0 [Note] InnoDB: Retrying to lock the first data file
2020-07-16T09:35:36.514159Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2020-07-16T09:35:36.514213Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2020-07-16T09:35:37.518876Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2020-07-16T09:35:37.518916Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2020-07-16T09:35:38.523434Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2020-07-16T09:35:38.523486Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2020-07-16T09:35:39.526138Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11
2020-07-16T09:35:39.526191Z 0 [Note] InnoDB: Check that you do not already have another mysqld process using the same InnoDB data or log files.
2020-07-16T09:35:40.530406Z 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11

Workaround tried:

  1. When the Kubelet service came back again on worker01 the mysql which is terminated improperly was removed and the new pod in worker03 was able to access that file.

  2. During memory full on worker01 node, the issue is resolved only after the memory is recovered back.

The reason pods are in terminating state is because kubelet which is supposed to gracefully terminate a pod is stopped.

Afterstopping kubelet service on a worker node you should delete that worker node from API Server to clean up terminating pods which were running in that node. This should make the pods schedulable to other available worker nodes.

kubectl delete node nodename

Once kubelet is running again on that worker node it will automatically register that node with API Server.

As a best practice pods should use taint based eviction so that pod gets evicted based on tolerationSeconds and conditions such as

node.kubernetes.io/not-ready : Node is not ready. This corresponds to the NodeCondition Ready being "False"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM