无法获取节点连接到副本集

Question

I'm attempting to setup a MongoDB test replica set. 我正在尝试设置MongoDB测试副本集。 The problem is that I can't find any way to get an error message and one of the nodes remains permanently in DOWN or UNKNOWN status. 问题是我找不到任何方法来获取错误消息，其中一个节点永久保持在DOWN或UNKNOWN状态。

Here is my rs.status from the primary 这是我的初级rs.status

    {
            "set" : "rs0",
            "date" : ISODate("2014-05-08T00:41:11Z"),
            "myState" : 1,
            "members" : [
                    {
                            "_id" : 0,
                            "name" : "mongo1:27017",
                            "health" : 1,
                            "state" : 1,
                            "stateStr" : "PRIMARY",
                            "uptime" : 3319,
                            "optime" : Timestamp(1399509356, 1),
                            "optimeDate" : ISODate("2014-05-08T00:35:56Z"),
                            "electionTime" : Timestamp(1399506359, 1),
                            "electionDate" : ISODate("2014-05-07T23:45:59Z"),
                            "self" : true
                    },
                    {
                            "_id" : 2,
                            "name" : "mongo3:30000",
                            "health" : 1,
                            "state" : 2,
                            "stateStr" : "SECONDARY",
                            "uptime" : 319,
                            "lastHeartbeat" : ISODate("2014-05-08T00:41:11Z"),
                            "lastHeartbeatRecv" : ISODate("2014-05-08T00:41:11Z"),
                            "pingMs" : 2,
                            "syncingTo" : "mongo1:27017"
                    },
                    {
                            "_id" : 3,
                            "name" : "mongo2:27018",
                            "health" : 1,
                            "state" : 6,
                            "stateStr" : "UNKNOWN",
                            "uptime" : 315,
                            "optime" : Timestamp(0, 0),
                            "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                            "lastHeartbeat" : ISODate("2014-05-08T00:41:11Z"),
                            "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
                            "pingMs" : 2,
                            "lastHeartbeatMessage" : "still initializing"
                    }
            ],
            "ok" : 1
    }

Here is the rs.conf from primary 这是来自primary的rs.conf

    {
            "_id" : "rs0",
            "version" : 12,
            "members" : [
                    {
                            "_id" : 0,
                            "host" : "mongo1:27017"
                    },
                    {
                            "_id" : 2,
                            "host" : "mongo3:30000",
                            "arbiterOnly" : true
                    },
                    {
                            "_id" : 3,
                            "host" : "mongo2:27018"
                    }
            ]
    }

The issue is mongo2:27018. 问题是mongo2：27018。 I've tried adding and removing it. 我尝试过添加和删除它。 I've tried wiping the entire box and re-installing Cent + Mongo. 我试过擦拭整个盒子并重新安装Cent + Mongo。 From any of the 3 boxes, I can mongo to other the 2. So from mongo1:27017 I can type mongo mongo2:27018 and it has no problems. 从3个盒子中的任何一个，我可以mongo到其他2.所以从mongo1：27017我可以输入mongo mongo2:27018并且它没有问题。 All 3 boxes have the same configuration which I've double, triple, and quadraple checked in their /etc/hosts . 所有3个盒子都具有相同的配置，我在/etc/hosts检查了双重，三重和四重。

The only debugging information I can find anywhere is the following block on problematic node: 我可以在任何地方找到的唯一调试信息是有问题的节点上的以下块：

    2014-05-08T02:45:51.763+0200 [initandlisten] connection accepted from 10.0.2.2:48720 #50 (2 connections now open)
    2014-05-08T02:46:00.593+0200 [rsStart] trying to contact mongo1:27017
    2014-05-08T02:46:00.602+0200 [rsStart] trying to contact mongo3:30000
    2014-05-08T02:46:00.605+0200 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.

Any guidance is appreciated, been struggling at this for 5 hours now. 任何指导都表示赞赏，现在已经在这5个小时内挣扎。

Answer 1

The eventual issue we discovered is that the hostname for each replica node not only needs to be valid between the nodes but also from a node to itself! 我们发现的最终问题是每个副本节点的主机名不仅需要在节点之间有效，还要从节点到自身有效！

In example, due to some port forwarding going on, mongo1 could successfully communicate to mongo2 by mongo2:27018, mongo3 could successfully communicate to mongo2 by mongo2:27018, but mongo2 could not communicate to itself at mongo2:27018 (since it was actually listening on 27017). 例如，由于某些端口转发正在进行，mongo1可以通过mongo2：27018与mongo2成功通信，mongo3可以通过mongo2：27018与mongo2成功通信，但mongo2无法在mongo2：27018与自身通信（因为它实际上是在监听在27017）。 The reason it worked for the other boxes was that they were mongo1 and mongo3 had an alias for mongo2 which was port forwarding 27018 to 27017. 它为其他盒子工作的原因是它们是mongo1而mongo3有mongo2的别名，它的端口转发27018到27017。

So basically unless each node can ping themselves AND the other nodes from the hostname in the config it will not work! 所以基本上除非每个节点都可以ping自己和配置中的主机名中的其他节点，否则它将无效！

无法获取节点连接到副本集

问题描述

1 个解决方案

解决方案1
6 已采纳 2014-06-05 06:09:36

无法获取节点连接到副本集

问题描述

1 个解决方案

解决方案1 6 已采纳 2014-06-05 06:09:36

解决方案1
6 已采纳 2014-06-05 06:09:36