简体   繁体   中英

pg_wal folder on standby node not removing files (postgresql-11)

I have master-slave (primary-standby) streaming replication set up on 2 physical nodes. Although the replication is working correctly and walsender and walreceiver both work fine, the files in the pg_wal folder on the slave node are not getting removed. This is a problem I have been facing every time I try to bring the slave node back after a crash. Here are the details of the problem:

postgresql.conf on master and slave/standby node

# Connection settings
# -------------------
listen_addresses = '*'
port = 5432
max_connections = 400

tcp_keepalives_idle = 0
tcp_keepalives_interval = 0
tcp_keepalives_count = 0

# Memory-related settings
# -----------------------
shared_buffers = 32GB           # Physical memory 1/4
##DEBUG:  mmap(1652555776) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
#huge_pages = try              # on, off, or try
#temp_buffers = 16MB            # depends on DB checklist
work_mem = 8MB                 # Need tuning
effective_cache_size = 64GB      # Physical memory 1/2
maintenance_work_mem = 512MB
wal_buffers = 64MB

# WAL/Replication/HA   settings
# --------------------
wal_level = logical
synchronous_commit = remote_write
archive_mode = on
archive_command = 'rsync -a %p /TPINFO01/wal_archive/%f'
#archive_command = ':'
max_wal_senders=5
hot_standby = on
restart_after_crash = off
wal_sender_timeout = 5000
wal_receiver_status_interval = 2
max_standby_streaming_delay = -1
max_standby_archive_delay = -1
hot_standby_feedback = on
random_page_cost = 1.5

max_wal_size = 5GB
min_wal_size = 200MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 30min

# Logging settings
# ----------------
log_destination = 'csvlog,syslog'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql_%Y%m%d.log'
log_truncate_on_rotation = off
log_rotation_age = 1h
log_rotation_size = 0

log_timezone = 'Japan'
log_line_prefix = '%t [%p]: [%l-1] %h:%u@%d:[PG]:CODE:%e '

log_statement = all
log_min_messages = info         # DEBUG5
log_min_error_statement = info  # DEBUG5
log_error_verbosity = default
log_checkpoints = on
log_lock_waits = on
log_temp_files = 0
log_connections = on
log_disconnections = on
log_duration = off
log_min_duration_statement = 1000
log_autovacuum_min_duration = 3000ms

track_functions = pl
track_activity_query_size = 8192

# Locale/display settings
# -----------------------
lc_messages = 'C'
lc_monetary = 'en_US.UTF-8'  # ja_JP.eucJP
lc_numeric  = 'en_US.UTF-8'  # ja_JP.eucJP
lc_time     = 'en_US.UTF-8'  # ja_JP.eucJP
timezone = 'Asia/Tokyo'
bytea_output = 'escape'


# Auto vacuum settings
# -----------------------
autovacuum = on
autovacuum_max_workers = 3
autovacuum_vacuum_cost_limit = 200

auto_explain.log_min_duration = 10000
auto_explain.log_analyze = on
include '/var/lib/pgsql/tmp/rep_mode.conf' # added by pgsql RA

recovery.conf

primary_conninfo = 'host=xxx.xx.xx.xx port=5432 user=replica application_name=xxxxx keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'rsync -a /TPINFO01/wal_archive/%f %p'
recovery_target_timeline = 'latest'
standby_mode = 'on'

Result of pg_stat_replication on master/primary

select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 8868
usesysid         | 16420
usename          | xxxxxxx
application_name | sub_xxxxxxx
client_addr      | xx.xx.xxx.xxx
client_hostname  |
client_port      | 21110
backend_start    | 2021-06-10 10:55:37.61795+09
backend_xmin     |
state            | streaming
sent_lsn         | 97AC/589D93B8
write_lsn        | 97AC/589D93B8
flush_lsn        | 97AC/589D93B8
replay_lsn       | 97AC/589D93B8
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
-[ RECORD 2 ]----+------------------------------
pid              | 221533
usesysid         | 3541624258
usename          | replica
application_name | xxxxx
client_addr      | xxx.xx.xx.xx
client_hostname  |
client_port      | 55338
backend_start    | 2021-06-12 21:26:40.192443+09
backend_xmin     | 72866358
state            | streaming
sent_lsn         | 97AC/589D93B8
write_lsn        | 97AC/589D93B8
flush_lsn        | 97AC/589D93B8
replay_lsn       | 97AC/589D93B8
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 1
sync_state       | sync

Steps I had followed to bring the standby node back from a crash

  • On master started select pg_start_backup('backup');
  • rsync data folder and wal_archive folder from master/primary to slave/standby
  • On master `select pg_stop_backup();
  • Restart postgres on slave/standby node.

This resulted in the slave/standby node being in sync with master and has been working fine since then.

On the primary/master node the pg_wal folder gets its files removed after nearly 2 hours. But the files on the slave/standby node are not removed. Almost all the files are in the archive_status folder in the pg_wal folder with the <filename>.done as well on the standby node. I guess the problem can go away if I perform a switchover, but I still want to understand the reason why it is happening.

Please see, I am also trying to find the answers to some of the following questions as well:

You didn't describe omitting pg_replslot during your rsync, as the docs recommend . If you didn't omit it, then now your replica has a replication slot which is a clone of the one on the master. But if nothing ever connects to that slot on the replica and advances the cutoff, then the WAL never gets released to recycling. To fix you just need to shutdown the replica, remove that directory, restart it, (and wait for the next restart point to finish).

Do they need to go to wal_archive folder on the disk just like they go to wal_archive folder on the master node?

No, that is optional not necessary. It is set by archive_mode = always if you want it to happen.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM