简体   繁体   中英

Streaming replication is failing with “WAL segment has already been moved”

I am trying to implement Master/Slave streaming replication on Postgres 11.5 . I ran the following steps -

On Master

select pg_start_backup('replication-setup',true);

On Slave Stopped the postgres 11 database and ran

rsync -aHAXxv --numeric-ids --progress -e "ssh -T -o Compression=no -x" --exclude pg_wal --exclude postgresql.pid --exclude pg_log MASTER:/var/lib/postgresql/11/main/* /var/lib/postgresql/11/main

On Master

select pg_stop_backup();

On Slave

rsync -aHAXxv --numeric-ids --progress -e "ssh -T -o Compression=no -x"  MASTER:/var/lib/postgresql/11/main/pg_wal/* /var/lib/postgresql/11/main/pg_wal

I created the recovery.conf file on slave ~/11/main folder

standby_mode = 'on'
primary_conninfo = 'user=postgres host=MASTER port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
primary_slot_name='my_repl_slot'

When I start Postgres on Slave, I get the error on both MASTER and SLAVE logs -

019-11-08 09:03:51.205 CST [27633] LOG:  00000: database system was interrupted; last known up at 2019-11-08 02:53:04 CST
2019-11-08 09:03:51.205 CST [27633] LOCATION:  StartupXLOG, xlog.c:6388
2019-11-08 09:03:51.252 CST [27633] LOG:  00000: entering standby mode
2019-11-08 09:03:51.252 CST [27633] LOCATION:  StartupXLOG, xlog.c:6443
2019-11-08 09:03:51.384 CST [27634] LOG:  00000: started streaming WAL from primary at 12DB/C000000 on timeline 1
2019-11-08 09:03:51.384 CST [27634] LOCATION:  WalReceiverMain, walreceiver.c:383
2019-11-08 09:03:51.384 CST [27634] FATAL:  XX000: could not receive data from WAL stream: ERROR:  requested WAL segment 00000001000012DB0000000C has already been removed
2019-11-08 09:03:51.384 CST [27634] LOCATION:  libpqrcv_receive, libpqwalreceiver.c:772
2019-11-08 09:03:51.408 CST [27635] LOG:  00000: started streaming WAL from primary at 12DB/C000000 on timeline 1
2019-11-08 09:03:51.408 CST [27635] LOCATION:  WalReceiverMain, walreceiver.c:383

The problem is the START WAL - 00000001000012DB0000000C is available right until I run the pg_stop_backup() and is getting archived and no longer available, once the pg_stop_backup() is executed. So this is not an issue of the WAL being archived out due to low WAL_KEEP_SEGMENTS .

postgres@SLAVE:~/11/main/pg_wal$ cat 00000001000012DB0000000C.00000718.backup
START WAL LOCATION: 12DB/C000718 (file 00000001000012DB0000000C)
STOP WAL LOCATION: 12DB/F4C30720 (file 00000001000012DB000000F4)
CHECKPOINT LOCATION: 12DB/C000750
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2019-11-07 15:47:26 CST
LABEL: replication-setup-mdurbha
START TIMELINE: 1
STOP TIME: 2019-11-08 08:48:35 CST
STOP TIMELINE: 1

My MASTER has archive_command set, and I have the missing WALs available. I copied them into a restore directory on the SLAVE and tried the recovery.conf below, but it still fails with the MASTER reporting the same WAL segment has already been moved error.
Any idea how I can address this issue? I have used rsync to setup replication without any issues in the past on Postgres 9.6, but have been experiencing this issue on Postgres 11.

standby_mode = 'on'
primary_conninfo = 'user=postgres host=MASTER port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
restore_command='cp /var/lib/postgresql/restore/%f %p'

Put a restore_command into recovery.conf that can restore archived WAL files and you are fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM