MongoDB SDK故障轉移無法正常工作

Question

我已經使用三台機器（192.168.122.21、192.168.122.147和192.168.122.148）設置了一個副本集，並且正在使用Java SDK與MongoDB Cluster進行交互：

ArrayList<ServerAddress> addrs = new ArrayList<ServerAddress>();
addrs.add(new ServerAddress("192.168.122.21", 27017));
addrs.add(new ServerAddress("192.168.122.147", 27017));
addrs.add(new ServerAddress("192.168.122.148", 27017));
this.mongoClient = new MongoClient(addrs);
this.db = this.mongoClient.getDB(this.db_name);
this.collection = this.db.getCollection(this.collection_name);

建立連接后，我將多次插入一個簡單的測試文檔：

    for (int i = 0; i < this.inserts; i++) {
        try {
           this.collection.insert(new BasicDBObject(String.valueOf(i), "test"));
        } catch (Exception e) {
            System.out.println("Error on inserting element: " + i);
            e.printStackTrace();
        }
    }

在模擬主服務器的節點崩潰（關閉電源）時，MongoDB集群會成功進行故障轉移：

       19:08:03.907+0100 [rsHealthPoll] replSet info 192.168.122.21:27017 is down (or slow to respond): 
       19:08:03.907+0100 [rsHealthPoll] replSet member 192.168.122.21:27017 is now in state DOWN
       19:08:04.153+0100 [rsMgr] replSet info electSelf 1
       19:08:04.154+0100 [rsMgr] replSet couldn't elect self, only received -9999 votes
       19:08:05.648+0100 [conn15] replSet info voting yea for 192.168.122.148:27017 (2)
       19:08:10.681+0100 [rsMgr] replSet not trying to elect self as responded yea to someone else recently
       19:08:10.910+0100 [rsHealthPoll] replset info 192.168.122.21:27017 heartbeat failed, retrying
       19:08:16.394+0100 [rsMgr] replSet not trying to elect self as responded yea to someone else recently
       19:08:22.876+.
       19:08:22.912+0100 [rsHealthPoll] replset info 192.168.122.21:27017 heartbeat failed, retrying
       19:08:23.623+0100 [SyncSourceFeedbackThread] replset setting syncSourceFeedback to 192.168.122.148:27017
       19:08:23.917+0100 [rsHealthPoll] replSet member 192.168.122.148:27017 is now in state PRIMARY

客戶端的MongoDB驅動程序也可以識別這一點：

       Dec 01, 2014 7:08:16 PM com.mongodb.ConnectionStatus$UpdatableNode update
       WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: Read timed out
       WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: couldn't connect to [/192.168.122.21:27017]  bc:java.net.SocketTimeoutException: connect timed out
       Dec 01, 2014 7:08:36 PM com.mongodb.DBTCPConnector setMasterAddress
       WARNING: Primary switching from /192.168.122.21:27017 to /192.168.122.148:27017

但是它仍然一直嘗試（永遠）連接到舊節點：

       Dec 01, 2014 7:08:50 PM com.mongodb.ConnectionStatus$UpdatableNode update
       WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: couldn't connect to [/192.168.122.21:27017] bc:java.net.NoRouteToHostException: No route to host
       .....
       Dec 01, 2014 7:10:43 PM com.mongodb.ConnectionStatus$UpdatableNode update
       WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException -message: couldn't connect to [/192.168.122.21:27017] bc:java.net.NoRouteToHostException: No route to host

從主數據庫發生故障並從輔助數據庫變為主數據庫的那一刻起，數據庫上的文檔計數就保持不變。 這是該過程中同一節點的輸出：

“ rs0”：SECONDARY> db.test_collection.find（）。count（）12260161

“ rs0”：PRIMARY> db.test_collection.find（）。count（）12260161

更新：使用未確認的WriteConcern可以按設計工作。 插入操作也會在新的母版上執行，並且選舉過程中的所有操作都會丟失。

有了WriteConcern Acknowleged，看來操作無限期地等待着崩潰的主機的ACK。 這可以解釋為什么在崩潰的服務器再次啟動並再次加入群集后，程序仍繼續運行的原因。 但就我而言，我不希望驅動程序永遠等待，它應該在一定時間后引發錯誤。

更新：殺死主數據庫上的mongod進程時，已確認WriteConcern的功能也按預期工作。 在這種情況下，故障轉移僅需約3秒。 在此期間，不執行插入操作，在選擇新的主數據庫之后，插入操作將繼續。

因此，只有在模擬節點故障（斷電/網絡關閉）時才出現問題。 在這種情況下，操作將掛起，直到故障節點再次啟動。

Answer 1

您的應用仍然可以使用嗎？ 由於該服務器仍在您的種子列表中，據我所知，驅動程序將嘗試連接到該服務器。 只要您的種子列表中的任何其他服務器都能獲得主要狀態，您的應用程序就應該仍然可以運行。

Answer 2

明確指定連接超時值可解決該錯誤。 另請參閱： http : //api.mongodb.org/java/2.7.0/com/mongodb/MongoOptions.html

MongoDB SDK故障轉移無法正常工作

問題描述

2 個解決方案

解決方案1
0 2014-12-01 19:54:13

解決方案2
0 2015-03-13 16:14:57

MongoDB SDK故障轉移無法正常工作

問題描述

2 個解決方案

解決方案1 0 2014-12-01 19:54:13

解決方案2 0 2015-03-13 16:14:57

解決方案1
0 2014-12-01 19:54:13

解決方案2
0 2015-03-13 16:14:57