简体   繁体   English

从群集中删除一个工作节点后,OpenEBS目标Pod无法与其副本进行通信

[英]OpenEBS target pod is not able to communicate with its replicas after deleting one of the worker node from the cluster

Having a problem with an OpenEBS data store. OpenEBS数据存储有问题。 Set up is with 3 OpenEBS storage replica on 3 different VMs. 设置是在3个不同的VM上使用3个OpenEBS存储副本。 Initially the work pod (postgresql) went into read-only mode, so I deleted first the work node and, after it didn't recover, the openEBS ctrl pod. 最初,工作容器(postgresql)进入只读模式,因此我首先删除了工作节点,然后在它无法恢复后删除了openEBS ctrl容器。 Now it seems the ctrl pod cannot reconnect with all 3 replicas and keeps showing the message: level=warning msg="No of yet to be registered replicas are less than 3 , No of registered replicas: 1" The replica that seems to have managed to connect keeps logging repeatedly: 现在,似乎ctrl pod无法重新连接所有3个副本,并不断显示以下消息: level=warning msg="No of yet to be registered replicas are less than 3 , No of registered replicas: 1"似乎已管理的副本连接保持重复记录:

time="2019-01-22T08:04:12Z" level=info msg="Get Volume info from controller"
time="2019-01-22T08:04:12Z" level=info msg="Register replica at controller"

Target pod logs 目标吊舱日志

"2019-01-22T06:55:46.064Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:46Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:46.065Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:46Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:48.076Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:48Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:48.075Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:48Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:50.085Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:50Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:49.083Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:49Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:50.086Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:50Z"" level=warning msg=busy"
"2019-01-22T06:55:50.085Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:50Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:49.084Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:49Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:53.105Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:53Z"" level=warning msg=busy"
"2019-01-22T06:55:53.104Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:53Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:55.117Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:55Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:54.107Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:54Z"" level=error msg=""Mode: ReadOnly"""
"2019-01-22T06:55:54.107Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-ctrl-6658c7df95-m2hnf","time=""2019-01-22T06:55:54Z"" level=error msg=""Mode: ReadOnly"""

Replica pod which is not yet conencted 尚未断开的副本容器

"2019-01-22T06:56:24.117Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9700 volume-head-010.img.meta]"""
"2019-01-22T06:56:24.866Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""source file size: 12884901888, setting up directIo: true"""
"2019-01-22T06:56:11.390Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:11Z"" level=info msg=""Get Volume Usage"""
"2019-01-22T06:56:23.881Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:23Z"" level=info msg=""Snapshotting [d82c79af-06fd-4bc4-bd67-c54fa636e596] volume, user created false, created time 2019-01-22T06:56:23Z"""
"2019-01-22T06:56:23.924Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.96.147 - - [22/Jan/2019:06:56:23 +0000] ""POST /v1/replicas/1?action=snapshot HTTP/1.1"" 200 14804"
"2019-01-22T06:56:24.049Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9700 volume-head-010.img.meta]"""
"2019-01-22T06:56:24.828Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9701 volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img]"""
"2019-01-22T06:56:24.885Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""The file is a hole: [      0: 3145728](3145728)"""
"2019-01-22T06:56:24.886Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:23.872Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:23Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T06:56:24.019Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=GetReplica"
"2019-01-22T06:56:24.019Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T06:56:24.886Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:24Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9701 volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img]"""
"2019-01-22T06:56:25.607Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=info msg=""source file size: 112, setting up directIo: false"""
"2019-01-22T06:56:25.614Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=warning msg=""Failed to open server: 10.233.91.202:9702, Retrying..."""
"2019-01-22T06:56:26.628Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:26Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:28.353Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9703 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img]"""
"2019-01-22T06:56:28.419Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""source file size: 12884901888, setting up directIo: true"""
"2019-01-22T06:56:28.428Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""The file is a hole: [      0: 3145728](3145728)"""
"2019-01-22T06:56:28.431Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:29.121Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9704 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img.meta]"""
"2019-01-22T06:56:29.900Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Syncing volume-snap-f8771212-06d3-400b-ad12-c063ef8ed827.img to 10.233.91.202:9705...\n"""
"2019-01-22T06:56:29.900Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""source file size: 12884901888, setting up directIo: true"""
"2019-01-22T06:56:29.904Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=warning msg=""Failed to open server: 10.233.91.202:9705, Retrying..."""
"2019-01-22T06:56:25.607Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=info msg=""Syncing volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img.meta to 10.233.91.202:9702...\n"""
"2019-01-22T06:56:25.584Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:25Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9702 volume-snap-6b38fe32-98ab-4f95-8b2d-05ba9aebfe0e.img.meta]"""
"2019-01-22T06:56:28.419Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Syncing volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img to 10.233.91.202:9703...\n"""
"2019-01-22T06:56:29.215Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9704 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img.meta]"""
"2019-01-22T06:56:29.880Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9705 volume-snap-f8771212-06d3-400b-ad12-c063ef8ed827.img]"""
"2019-01-22T06:56:28.434Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:28Z"" level=info msg=""Done running ssync [ssync -host 10.233.91.202 -timeout 7 -port 9703 volume-snap-a03e749d-31ac-4375-9559-14fb141fc3d7.img]"""
"2019-01-22T06:56:29.211Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""Ssync client: exit code 0"""
"2019-01-22T06:56:29.183Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=info msg=""source file size: 164, setting up directIo: false"""
"2019-01-22T06:56:29.905Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:29Z"" level=warning msg=""Failed to open server: 10.233.91.202:9705, Retrying..."""
"2019-01-22T06:56:41.391Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:41Z"" level=info msg=GetUsage"
"2019-01-22T06:56:41.392Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.96.147 - - [22/Jan/2019:06:56:41 +0000] ""GET /v1/replicas/1/volusage HTTP/1.1"" 200 200"
"2019-01-22T06:56:41.390Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:56:41Z"" level=info msg=""Get Volume Usage"""
"2019-01-22T06:59:11.392Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:59:11Z"" level=info msg=GetUsage"
"2019-01-22T06:59:11.392Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T06:59:11Z"" level=info msg=""Get Volume Usage"""
"2019-01-22T07:00:38.050Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:38Z"" level=error msg=""Received EOF: EOF"""
"2019-01-22T07:00:38.050Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:38Z"" level=info msg=""Restart AutoConfigure Process"""
"2019-01-22T07:00:43.232Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.91.234 - - [22/Jan/2019:07:00:43 +0000] ""POST /v1/replicas/1?action=start HTTP/1.1"" 200 1091"
"2019-01-22T07:00:43.238Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T07:00:43.409Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=""GetReplica for id 1"""
"2019-01-22T07:00:43.465Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=GetReplica"
"2019-01-22T07:00:43.239Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=""Got signal: 'open', proceed to open replica"""
"2019-01-22T07:00:43.585Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","10.233.91.234 - - [22/Jan/2019:07:00:43 +0000] ""POST /v1/replicas/1?action=snapshot HTTP/1.1"" 200 15190"
"2019-01-22T07:00:43.666Z","pvc-a2e1d1bf-db64-11e8-9384-fee6a1e98ebe-rep-7bf8ffb665-v5777","time=""2019-01-22T07:00:43Z"" level=info msg=GetReplica"

After going through the logs, I can see that replicas were registered to controller but one of the replica is getting synced with other healthy replica, which might take some time 查看日志后,我可以看到副本已注册到控制器,但是其中一个副本正在与其他正常副本同步,这可能需要一些时间

And after sometime I can see from target pod 一段时间后,我可以从目标广告连播中看到

level=warning msg="No of yet to be registered replicas are less than 3 , No of registered replicas: 1"

which no longer shows up. 不再显示。 I think it is recovering right now. 我认为现在正在恢复。 I have a data of 12GiB size. 我有一个12GiB大小的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM