简体   繁体   中英

MongoDB replica set failed

I am having a MongoDB Replica set consisting three nodes, 1 Primary, 1 Secondary and one Arbiter.

When I was performing the initial re-sync on secondary node from the primary, the primary node got terminated. When I checked the logs of primary node the exception being shown was

SEVERE: Invalid access at address: 0x7fcde1e00ff0SEVERE: Invalid access at address: 0x7fcde1e00ff0

SEVERE: Got signal: 7 (Bus error)

Since then this primary node is not getting started due to this exception and secondary node is stuck in the STARTUP2 state.

I am able to start the primary node on different port as a standalone node (or in maintenance mode) and read its data. But whenever I am running it as a part of replica set it is getting terminated with above exception

The primary and secondary are having RAID0 as their storage. The data size is around 550GB.

I copied the whole data of primary node(currently down) to the secondary node(in STARTUP2 state) and then restarted the secondary node. But it also didn't worked. Secondary node getting elected to primary on restart but also getting terminated within a second of election with below exception :

SEVERE: Fatal DBException in logOp(): 10334 BSONObj size: 50359410 (0x3006C72) is invalid. Size must be between 0 and 16793600(16MB) First element: 2: ?type=111

SEVERE: terminate() called, printing stack (if implemented for platform): 0x11fd1b1 0x11fc438 0x7ff56dc01846 0x7ff56dc01873 0xe54c9e 0xc4de1b 0xc58f46 0xa0bac1 0xa0c250 0xa0f1bf 0xa0fcc1 0xa1323e 0xa2949a 0xa2af32 0xa2cd36 0xd61654 0xba21a2 0xba3780 0x7724a9 0x11b2fde

How to recover and restore the replica set in this case.

I am also having the backup of this data. Can I drop this replica set and recreate the replica set with this backup data ?

There is another replica set in this MongoDB cluster which is working fine.

Your secondary server's eligibility is impossible due to replication lag. Can you post your rs.status() 's output?

Your secondary server probably has a "could not find member to sync from" infoMessage .

I've run through something similar before due to bad RAM. It can be whatever.

Fix it by copying the primary server's data into another folder on the secondary and start a new instance on some other port on it, and then add it to the replica ( with the { force: true } options ) so the secondary server have somewhere to sync from.

You can also destroy the replica and create it again, but beware not to loose your replica's op-log.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM