簡體   English   中英

MongoDB SDK故障轉移無法正常工作

[英]MongoDB SDK Failover not working

我已經使用三台機器(192.168.122.21、192.168.122.147和192.168.122.148)設置了一個副本集,並且正在使用Java SDK與MongoDB Cluster進行交互:

ArrayList<ServerAddress> addrs = new ArrayList<ServerAddress>();
addrs.add(new ServerAddress("192.168.122.21", 27017));
addrs.add(new ServerAddress("192.168.122.147", 27017));
addrs.add(new ServerAddress("192.168.122.148", 27017));
this.mongoClient = new MongoClient(addrs);
this.db = this.mongoClient.getDB(this.db_name);
this.collection = this.db.getCollection(this.collection_name);

建立連接后,我將多次插入一個簡單的測試文檔:

    for (int i = 0; i < this.inserts; i++) {
        try {
           this.collection.insert(new BasicDBObject(String.valueOf(i), "test"));
        } catch (Exception e) {
            System.out.println("Error on inserting element: " + i);
            e.printStackTrace();
        }
    }

在模擬主服務器的節點崩潰(關閉電源)時,MongoDB集群會成功進行故障轉移:

       19:08:03.907+0100 [rsHealthPoll] replSet info 192.168.122.21:27017 is down (or slow to respond): 
       19:08:03.907+0100 [rsHealthPoll] replSet member 192.168.122.21:27017 is now in state DOWN
       19:08:04.153+0100 [rsMgr] replSet info electSelf 1
       19:08:04.154+0100 [rsMgr] replSet couldn't elect self, only received -9999 votes
       19:08:05.648+0100 [conn15] replSet info voting yea for 192.168.122.148:27017 (2)
       19:08:10.681+0100 [rsMgr] replSet not trying to elect self as responded yea to someone else recently
       19:08:10.910+0100 [rsHealthPoll] replset info 192.168.122.21:27017 heartbeat failed, retrying
       19:08:16.394+0100 [rsMgr] replSet not trying to elect self as responded yea to someone else recently
       19:08:22.876+.
       19:08:22.912+0100 [rsHealthPoll] replset info 192.168.122.21:27017 heartbeat failed, retrying
       19:08:23.623+0100 [SyncSourceFeedbackThread] replset setting syncSourceFeedback to 192.168.122.148:27017
       19:08:23.917+0100 [rsHealthPoll] replSet member 192.168.122.148:27017 is now in state PRIMARY

客戶端的MongoDB驅動程序也可以識別這一點:

       Dec 01, 2014 7:08:16 PM com.mongodb.ConnectionStatus$UpdatableNode update
       WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: Read timed out
       WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: couldn't connect to [/192.168.122.21:27017]  bc:java.net.SocketTimeoutException: connect timed out
       Dec 01, 2014 7:08:36 PM com.mongodb.DBTCPConnector setMasterAddress
       WARNING: Primary switching from /192.168.122.21:27017 to /192.168.122.148:27017

但是它仍然一直嘗試(永遠)連接到舊節點:

       Dec 01, 2014 7:08:50 PM com.mongodb.ConnectionStatus$UpdatableNode update
       WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: couldn't connect to [/192.168.122.21:27017] bc:java.net.NoRouteToHostException: No route to host
       .....
       Dec 01, 2014 7:10:43 PM com.mongodb.ConnectionStatus$UpdatableNode update
       WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException -message: couldn't connect to [/192.168.122.21:27017] bc:java.net.NoRouteToHostException: No route to host

從主數據庫發生故障並從輔助數據庫變為主數據庫的那一刻起,數據庫上的文檔計數就保持不變。 這是該過程中同一節點的輸出:

“ rs0”:SECONDARY> db.test_collection.find()。count()12260161

“ rs0”:PRIMARY> db.test_collection.find()。count()12260161

更新:使用未確認的WriteConcern可以按設計工作。 插入操作也會在新的母版上執行,並且選舉過程中的所有操作都會丟失。

有了WriteConcern Acknowleged,看來操作無限期地等待着崩潰的主機的ACK。 這可以解釋為什么在崩潰的服務器再次啟動並再次加入群集后,程序仍繼續運行的原因。 但就我而言,我不希望驅動程序永遠等待,它應該在一定時間后引發錯誤。

更新:殺死主數據庫上的mongod進程時,已確認WriteConcern的功能也按預期工作。 在這種情況下,故障轉移僅需約3秒。 在此期間,不執行插入操作,在選擇新的主數據庫之后,插入操作將繼續。

因此,只有在模擬節點故障(斷電/網絡關閉)時才出現問題。 在這種情況下,操作將掛起,直到故障節點再次啟動。

您的應用仍然可以使用嗎? 由於該服務器仍在您的種子列表中,據我所知,驅動程序將嘗試連接到該服務器。 只要您的種子列表中的任何其他服務器都能獲得主要狀態,您的應用程序就應該仍然可以運行。

明確指定連接超時值可解決該錯誤。 另請參閱: http : //api.mongodb.org/java/2.7.0/com/mongodb/MongoOptions.html

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM