PG_WAL 非常大

Question

I have a Postgres cluster with 3 nodes: ETCD+Patroni+Postgres13.我有一个包含 3 个节点的 Postgres 集群：ETCD+Patroni+Postgres13。

Now there was a problem of constantly growing pg_wal folder.现在出现了不断增长的pg_wal文件夹的问题。 It now contains 5127 files.它现在包含 5127 个文件。 After searching the inte.net, I found an article advising you to pay attention to the following database parameters (their meaning at the time of the case is this):在互联网上搜索后，发现一篇文章建议大家注意以下数据库参数（案例时的含义是这样的）：

archive_mode off;
wal_level replica;
max_wal_size 1G;

SELECT * FROM pg_replication_slots;


postgres=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]-------+------------
slot_name           | db2
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | t
active_pid          | 2247228
xmin                |
catalog_xmin        |
restart_lsn         | 2D/D0ADC308
confirmed_flush_lsn |
wal_status          | reserved
safe_wal_size       |
-[ RECORD 2 ]-------+------------
slot_name           | db1
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | t
active_pid          | 2247227
xmin                |
catalog_xmin        |
restart_lsn         | 2D/D0ADC308
confirmed_flush_lsn |
wal_status          | reserved
safe_wal_size       |

All other functionality of the Patroni cluster works (switchover, reinit, replication); Patroni 集群的所有其他功能都有效（切换、重新初始化、复制）；

root@srvdb3:~# patronictl -c /etc/patroni/patroni.yml list
+ Cluster: mobile (7173650272103321745) --+----+-----------+
| Member | Host       | Role    | State   | TL | Lag in MB |
+--------+------------+---------+---------+----+-----------+
| db1    | 10.01.1.01 | Replica | running | 17 |         0 |
| db2    | 10.01.1.02 | Replica | running | 17 |         0 |
| db3    | 10.01.1.03 | Leader  | running | 17 |           |
+--------+------------+---------+---------+----+-----------+

Patroni patroni-edit:赞助人赞助人编辑：

loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    checkpoint_timeout: 30
    hot_standby: 'on'
    max_connections: '1100'
    max_replication_slots: 5
    max_wal_senders: 5
    shared_buffers: 2048MB
    wal_keep_segments: 5120
    wal_level: replica
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
ttl: 100

Help please, what could be the matter?请帮助，可能是什么问题？

This is what I see in pg_stat_archiver :这是我在pg_stat_archiver中看到的：

postgres=# select * from pg_stat_archiver;
-[ RECORD 1 ]------+------------------------------
archived_count     | 0
last_archived_wal  |
last_archived_time |
failed_count       | 0
last_failed_wal    |
last_failed_time   |
stats_reset        | 2023-01-06 10:21:45.615312+00

Answer 1

If you have wal_keep_segments set to 5120, it is completely normal if you have 5127 WAL segments in pg_wal , because PostgreSQL will always retain at least 5120 old WAL segments.如果您将wal_keep_segments设置为 5120，那么在pg_wal中有 5127 个 WAL 段是完全正常的，因为 PostgreSQL 将始终保留至少 5120 个旧的 WAL 段。 If that is too many for you, reduce the parameter.如果这对您来说太多了，请减少参数。 If you are using replication slots, the only disadvantage is that you might only be able to pg_rewind soon after a failover.如果您正在使用复制槽，唯一的缺点是您可能只能在故障转移后很快进行pg_rewind 。

PG_WAL 非常大

问题描述

1 个解决方案

解决方案1
0 已采纳 2023-01-08 13:20:17

PG_WAL 非常大

问题描述

1 个解决方案

解决方案1 0 已采纳 2023-01-08 13:20:17

解决方案1
0 已采纳 2023-01-08 13:20:17