簡體   English   中英

Postgresql 9.4 級聯復制故障轉移

[英]Postgresql 9.4 Cascading replication failover

環境:

Ubuntu14.04 + Postgresql9.4.

以下是我的設置:('->'表示物理流復制PSR)

Master1 -> Slave1 (primary) -> Slave2

這行為正確 - Master1 上的更改反映在 Slave1 中,然后是 Slave2。

如果我禁用 Master1,並使用 trigger_file 將 Slave1 提升為 Master,那么 Slave1 將成功提升 - 我可以寫入 Slave1。

但是,新提升的 Slave1 和 Slave2 之間的復制停止

這是預期的行為嗎? 我期待復制繼續像這樣:

Slave1 -> Slave2

這樣對 Slave1 的寫入反映在 Slave2 中

更新

日志:

Slave1 推廣:

2017-10-03 16:43:20 BST  @ LOCATION:  libpqrcv_connect, libpqwalreceiver.c:107
2017-10-03 16:43:25 BST  @ FATAL:  XX000: could not connect to the primary server: could not connect to server: Connection refused
        Is the server running on host "192.168.20.55" and accepting
        TCP/IP connections on port 5432?

2017-10-03 16:43:25 BST  @ LOCATION:  libpqrcv_connect, libpqwalreceiver.c:107
2017-10-03 16:43:30 BST  @ LOG:  00000: trigger file found: /var/lib/postgresql/9.4/main/failover_trigger.5432
2017-10-03 16:43:30 BST  @ LOCATION:  CheckForStandbyTrigger, xlog.c:11440
2017-10-03 16:43:30 BST  @ LOG:  00000: redo done at 0/19000740
2017-10-03 16:43:30 BST  @ LOCATION:  StartupXLOG, xlog.c:7032
2017-10-03 16:43:30 BST  @ LOG:  00000: last completed transaction was at log time 2017-10-03 16:41:23.430752+01
2017-10-03 16:43:30 BST  @ LOCATION:  StartupXLOG, xlog.c:7037
2017-10-03 16:43:30 BST  @ LOG:  00000: selected new timeline ID: 2
2017-10-03 16:43:30 BST  @ LOCATION:  StartupXLOG, xlog.c:7153
2017-10-03 16:43:30 BST  @ LOG:  00000: archive recovery complete
2017-10-03 16:43:30 BST  @ LOCATION:  exitArchiveRecovery, xlog.c:5459
2017-10-03 16:43:30 BST  @ LOG:  00000: MultiXact member wraparound protections are now enabled
2017-10-03 16:43:30 BST  @ LOCATION:  DetermineSafeOldestOffset, multixact.c:2619
2017-10-03 16:43:30 BST  @ LOG:  00000: database system is ready to accept connections
2017-10-03 16:43:30 BST  @ LOCATION:  reaper, postmaster.c:2795
2017-10-03 16:43:30 BST  @ LOG:  00000: autovacuum launcher started
2017-10-03 16:43:30 BST  @ LOCATION:  AutoVacLauncherMain, autovacuum.c:431

奴隸2

2017-10-03 16:43:30 BST  @ LOG:  00000: replication terminated by primary server
2017-10-03 16:43:30 BST  @ DETAIL:  End of WAL reached on timeline 1 at 0/190007A8.
2017-10-03 16:43:30 BST  @ LOCATION:  WalReceiverMain, walreceiver.c:446
2017-10-03 16:43:30 BST  @ LOG:  00000: fetching timeline history file for timeline 2 from primary server
2017-10-03 16:43:30 BST  @ LOCATION:  WalRcvFetchTimeLineHistoryFiles, walreceiver.c:669
2017-10-03 16:43:30 BST  @ LOG:  00000: record with zero length at 0/190007A8
2017-10-03 16:43:30 BST  @ LOCATION:  ReadRecord, xlog.c:4184
2017-10-03 16:43:30 BST  @ LOG:  00000: restarted WAL streaming at 0/19000000 on timeline 1
2017-10-03 16:43:30 BST  @ LOCATION:  WalReceiverMain, walreceiver.c:374
2017-10-03 16:43:30 BST  @ LOG:  00000: replication terminated by primary server
2017-10-03 16:43:30 BST  @ DETAIL:  End of WAL reached on timeline 1 at 0/190007A8.

從機 1 IP:

192.168.20.56

Slave2 IP:

192.168.20.53

pg_hba.conf 允許 Slave2 連接到 Slave1 進行復制:

Slave1 pg_hba.conf 段:

host    replication     replication     192.168.20.53/32        trust 

Slave1 recovery.done:

standby_mode = 'on'
primary_conninfo = 'user=replication host=192.168.20.55 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
trigger_file = '/var/lib/postgresql/9.4/main/failover_trigger.5432'

Slave2 recovery.conf:

standby_mode = 'on'
primary_conninfo = 'user=replication host=192.168.20.56 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'

非常感謝任何幫助。

更新和解決方案

感謝@Vao Tsun回答,在 Slave2 recovery.conf 中添加了設置為“最新”的 recovery_target_timeline,並重新啟動 Slave2 postgresql 服務器(不重新加載)允許復制過程重新啟動:

standby_mode = 'on'
primary_conninfo = 'user=replication host=192.168.20.56 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
recovery_target_timeline = 'latest'

你在 slave1 日志中看到:

2017-10-03 16:43:30 BST  @ LOG:  00000: selected new timeline ID: 2

在 slave2 中:

017-10-03 16:43:30 BST  @ DETAIL:  End of WAL reached on timeline 1 at 0/190007A8.

所以slave2在升級后沒有切換到時間線二。

正如我在評論中所說,你需要在 slave2 recovery.conf 中recovery_target_timeline='latest'

https://www.postgresql.org/docs/current/static/recovery-target-settings.html

recovery_target_timeline(字符串)指定恢復到特定時間線。 默認設置是沿着進行基本備份時的當前時間線進行恢復。 將此設置為最新可以恢復到存檔中找到的最新時間線,這在備用服務器中很有用。 除此之外,您只需要在復雜的重新恢復情況下設置此參數,在這種情況下您需要返回到在時間點恢復后達到的狀態。 有關討論,請參見第 25.3.5 節。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM