简体   繁体   中英

Can't get node to connect to replica set

I'm attempting to setup a MongoDB test replica set. The problem is that I can't find any way to get an error message and one of the nodes remains permanently in DOWN or UNKNOWN status.

Here is my rs.status from the primary

    {
            "set" : "rs0",
            "date" : ISODate("2014-05-08T00:41:11Z"),
            "myState" : 1,
            "members" : [
                    {
                            "_id" : 0,
                            "name" : "mongo1:27017",
                            "health" : 1,
                            "state" : 1,
                            "stateStr" : "PRIMARY",
                            "uptime" : 3319,
                            "optime" : Timestamp(1399509356, 1),
                            "optimeDate" : ISODate("2014-05-08T00:35:56Z"),
                            "electionTime" : Timestamp(1399506359, 1),
                            "electionDate" : ISODate("2014-05-07T23:45:59Z"),
                            "self" : true
                    },
                    {
                            "_id" : 2,
                            "name" : "mongo3:30000",
                            "health" : 1,
                            "state" : 2,
                            "stateStr" : "SECONDARY",
                            "uptime" : 319,
                            "lastHeartbeat" : ISODate("2014-05-08T00:41:11Z"),
                            "lastHeartbeatRecv" : ISODate("2014-05-08T00:41:11Z"),
                            "pingMs" : 2,
                            "syncingTo" : "mongo1:27017"
                    },
                    {
                            "_id" : 3,
                            "name" : "mongo2:27018",
                            "health" : 1,
                            "state" : 6,
                            "stateStr" : "UNKNOWN",
                            "uptime" : 315,
                            "optime" : Timestamp(0, 0),
                            "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                            "lastHeartbeat" : ISODate("2014-05-08T00:41:11Z"),
                            "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
                            "pingMs" : 2,
                            "lastHeartbeatMessage" : "still initializing"
                    }
            ],
            "ok" : 1
    }

Here is the rs.conf from primary

    {
            "_id" : "rs0",
            "version" : 12,
            "members" : [
                    {
                            "_id" : 0,
                            "host" : "mongo1:27017"
                    },
                    {
                            "_id" : 2,
                            "host" : "mongo3:30000",
                            "arbiterOnly" : true
                    },
                    {
                            "_id" : 3,
                            "host" : "mongo2:27018"
                    }
            ]
    }

The issue is mongo2:27018. I've tried adding and removing it. I've tried wiping the entire box and re-installing Cent + Mongo. From any of the 3 boxes, I can mongo to other the 2. So from mongo1:27017 I can type mongo mongo2:27018 and it has no problems. All 3 boxes have the same configuration which I've double, triple, and quadraple checked in their /etc/hosts .

The only debugging information I can find anywhere is the following block on problematic node:

    2014-05-08T02:45:51.763+0200 [initandlisten] connection accepted from 10.0.2.2:48720 #50 (2 connections now open)
    2014-05-08T02:46:00.593+0200 [rsStart] trying to contact mongo1:27017
    2014-05-08T02:46:00.602+0200 [rsStart] trying to contact mongo3:30000
    2014-05-08T02:46:00.605+0200 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.

Any guidance is appreciated, been struggling at this for 5 hours now.

The eventual issue we discovered is that the hostname for each replica node not only needs to be valid between the nodes but also from a node to itself!

In example, due to some port forwarding going on, mongo1 could successfully communicate to mongo2 by mongo2:27018, mongo3 could successfully communicate to mongo2 by mongo2:27018, but mongo2 could not communicate to itself at mongo2:27018 (since it was actually listening on 27017). The reason it worked for the other boxes was that they were mongo1 and mongo3 had an alias for mongo2 which was port forwarding 27018 to 27017.

So basically unless each node can ping themselves AND the other nodes from the hostname in the config it will not work!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM