简体   繁体   English

PostgreSQL BDR 与复制槽有关的问题

[英]PostgreSQL BDR issues with replication slots

If I may, I would like to write a brief summary of the issue before I add more details to make my question cleaner and clearer:如果可以的话,我想在添加更多细节之前写一个问题的简短摘要,以使我的问题更清晰:

I have a BDR group running in production, replication works fine in all nodes, but I'm getting errors regarding a replication slot in one of the nodes that I would like to fix without breaking anything else...我有一个 BDR 组在生产中运行,复制在所有节点中都可以正常工作,但是我在其中一个节点中遇到了关于复制槽的错误,我想在不破坏其他任何东西的情况下修复这些错误......

More details:更多细节:

I have a three-node BDR group running on PostgreSQL 9.4, 2 nodes using RedHat7, another using Centos7.我有一个三节点 BDR 组在 PostgreSQL 9.4 上运行,2 个节点使用 RedHat7,另一个使用 Centos7。 The database name is "MCT".数据库名称是“MCT”。

node_sysid         |node_timeline|node_dboid|node_status|node_name |node_local_dsn |node_init_from_dsn |node_read_only|
-------------------|-------------|----------|-----------|----------|---------------|-------------------|--------------|
6363474972929529353|            4|    263746|r          |REDHAT_1  |172.16.5.180   |                   |false         |
6182181726884158214|            2|    623638|r          |REDHAT_2  |192.168.251.30 | 172.16.5.180      |false         |
6197060214704992943|            2|    224093|r          |CENTOS_1  |172.16.5.184   | 172.16.5.180      |false         |

The replication is working perfectly from any node to the rest, but...从任何节点到 rest 的复制工作正常,但是......

All nodes are showing issues with replication slots in their logs:所有节点都在其日志中显示复制槽问题:

REDHAT_1: REDHAT_1:

2020-11-25 13:26:50.051 CET,"postgres","MCT",6607,"192.168.251.30:60463",5fbe4d8a.19cf,1,"idle",2020-11-25 13:26:50 CET,23/0,0,ERROR,55006,"replication slot ""bdr_263746_6182181726884158214_2_16389__"" is already active for pid 6097",,,,,,,,,"bdr (6182181726884158214,2,16389,):receive"

REDHAT_2: REDHAT_2:

2020-11-25 13:26:36.153 CET,,,2142,,5fbe48a5.85e,1015,,2020-11-25 13:05:57 CET,,0,LOG,00000,"proceso «background worker»: bdr (6182181726884158214,2,16389,)->bdr (6197060214704992943,2, (PID 12152) terminó con código de salida 1",,,,,,,,,""
2020-11-25 13:26:39.375 CET,,,2142,,5fbe48a5.85e,1016,,2020-11-25 13:05:57 CET,,0,LOG,00000,"iniciando el proceso «background worker» «bdr (6182181726884158214,2,16389,)->bdr (6363474972929529353,4,»",,,,,,,,,""
2020-11-25 13:26:39.389 CET,,,12154,,5fbe4d7f.2f7a,1,,2020-11-25 13:26:39 CET,14/0,0,FATAL,XX000,"could not send replication command ""START_REPLICATION SLOT ""bdr_263746_6182181726884158214_2_16389__"" LOGICAL CE4/BD6AF6A8 (pg_version '90410', pg_catversion '201409291', bdr_version '10002', bdr_variant 'BDR', min_bdr_version '700', sizeof_int '4', sizeof_long '8', sizeof_datum '8', maxalign '8', float4_byval '1', float8_byval '1', integer_datetimes '1', bigendian '0', db_encoding 'UTF8', replication_sets '""default""', db_encoding 'UTF8')"": ERROR:  replication slot ""bdr_263746_6182181726884158214_2_16389__"" is already active for pid 6097, sqlstate: 55006",,,,,,,,,"bdr (6182181726884158214,2,16389,): apply"
2020-11-25 13:26:39.390 CET,,,2142,,5fbe48a5.85e,1017,,2020-11-25 13:05:57 CET,,0,LOG,00000,"proceso «background worker»: bdr (6182181726884158214,2,16389,)->bdr (6363474972929529353,4, (PID 12154) terminó con código de salida 1",,,,,,,,,""
2020-11-25 13:26:41.155 CET,,,2142,,5fbe48a5.85e,1018,,2020-11-25 13:05:57 CET,,0,LOG,00000,"iniciando el proceso «background worker» «bdr (6182181726884158214,2,16389,)->bdr (6197060214704992943,2,»",,,,,,,,,""
2020-11-25 13:26:41.165 CET,,,12155,,5fbe4d81.2f7b,1,,2020-11-25 13:26:41 CET,14/0,0,FATAL,XX000,"could not send replication command ""START_REPLICATION SLOT ""bdr_224093_6182181726884158214_2_16389__"" LOGICAL 9B4/F020E568 (pg_version '90410', pg_catversion '201409291', bdr_version '10002', bdr_variant 'BDR', min_bdr_version '700', sizeof_int '4', sizeof_long '8', sizeof_datum '8', maxalign '8', float4_byval '1', float8_byval '1', integer_datetimes '1', bigendian '0', db_encoding 'UTF8', replication_sets '""default""', db_encoding 'UTF8')"": ERROR:  replication slot ""bdr_224093_6182181726884158214_2_16389__"" is already active for pid 2192

CENTOS_1: CENTOS_1:

2020-11-25 13:26:56.208 CET,"postgres","MCT",2668,"192.168.251.30:55270",5fbe4d90.a6c,1,"idle",2020-11-25 13:26:56 CET,10/0,0,ERROR,55006,"replication slot ""bdr_224093_6182181726884158214_2_16389__"" is already active for pid 2192",,,,,,,,,"bdr (6182181726884158214,2,16389,):receive"

If I run this query:如果我运行此查询:

select
    pg_catalog.inet_server_addr() as server_addr,
    pg_xlog_location_diff(pg_current_xlog_insert_location(),
    flush_location)::int8 as lag_bytes,
    pid,
    application_name,
    client_addr,
    state
from
    pg_stat_replication;

I get this results:我得到这个结果:

REDHAT_1: REDHAT_1:

server_addr |lag_bytes|pid |application_name                           |client_addr   |state    |
------------|---------|----|-------------------------------------------|--------------|---------|
172.16.5.180|    21088|6131|bdr (6197060214704992943,2,224093,):receive|172.16.5.184  |streaming|
172.16.5.180|    14440|6097|bdr (6182181726884158214,2,16389,):receive |192.168.251.30|streaming|
172.16.5.180|     3616|6099|bdr (6182181726884158214,2,623638,):receive|192.168.251.30|streaming|

REDHAT_2: REDHAT_2:

server_addr |lag_bytes|pid |application_name                           |client_addr   |state    |
------------|---------|----|-------------------------------------------|--------------|---------|
172.16.5.184|        0|2191|bdr (6182181726884158214,2,623638,):receive|192.168.251.30|streaming|
172.16.5.184|        0|2192|bdr (6182181726884158214,2,16389,):receive |192.168.251.30|streaming|
172.16.5.184|        0|2203|bdr (6363474972929529353,4,263746,):receive|172.16.5.180  |streaming|

CENTOS_1: CENTOS_1:

server_addr   |lag_bytes|pid |application_name                           |client_addr   |state    |
--------------|---------|----|-------------------------------------------|--------------|---------|
192.168.251.30|        0|4298|bdr (6182181726884158214,2,16389,):receive |192.168.251.30|streaming|
192.168.251.30|        0|5123|bdr (6197060214704992943,2,224093,):receive|172.16.5.184  |streaming|
192.168.251.30|        0|4313|bdr (6363474972929529353,4,263746,):receive|172.16.5.180  |streaming|

And this query:这个查询:

select
    slot_name,
    plugin,
    slot_type,
    datoid,
    "database",
    active,
    xmin,
    catalog_xmin,
    restart_lsn
from
    pg_catalog.pg_replication_slots;

REDHAT_1: REDHAT_1:

slot_name                                |plugin|slot_type|datoid|database|active|xmin|catalog_xmin|restart_lsn |
-----------------------------------------|------|---------|------|--------|------|----|------------|------------|
bdr_263746_6182181726884158214_2_16389__ |bdr   |logical  |263746|MCT     |true  |    |2466301995  |CE4/CDA614E8|
bdr_263746_6182181726884158214_2_623638__|bdr   |logical  |263746|MCT     |true  |    |2466301995  |CE4/CDA614E8|
bdr_263746_6197060214704992943_2_224093__|bdr   |logical  |263746|MCT     |true  |    |2466301995  |CE4/CDA614E8|

REDHAT_2: REDHAT_2:

slot_name                                |plugin|slot_type|datoid|database|active|xmin|catalog_xmin|restart_lsn |
-----------------------------------------|------|---------|------|--------|------|----|------------|------------|
bdr_224093_6182181726884158214_2_16389__ |bdr   |logical  |224093|MCT     |true  |    |1464775472  |9B4/FED0BCA8|
bdr_224093_6182181726884158214_2_623638__|bdr   |logical  |224093|MCT     |true  |    |1464775472  |9B4/FED0BCA8|
bdr_224093_6363474972929529353_4_263746__|bdr   |logical  |224093|MCT     |true  |    |1464775472  |9B4/FED0BCA8|

CENTOS_1: CENTOS_1:

    slot_name                                |plugin|slot_type|datoid|database|active|xmin|catalog_xmin|restart_lsn |
    -----------------------------------------|------|---------|------|--------|------|----|------------|------------|
    bdr_623638_6363474972929529353_4_263746__|bdr   |logical  |623638|MCT     |true  |    |546376380   |85C/647FE110|
    bdr_623638_6182181726884158214_2_16389__ |bdr   |logical  |623638|MCT     |true  |    |546376380   |85C/647FE110|
    bdr_623638_6197060214704992943_2_224093__|bdr   |logical  |623638|MCT     |true  |    |546376380   |85C/647FE110|

So, I wonder how I could get rid of that error without breaking anything in the BDR that is currently working on production.因此,我想知道如何在不破坏当前正在生产的 BDR 中的任何内容的情况下摆脱该错误。

Any thoughts?有什么想法吗?

Thank you in advance!先感谢您!

The underlying issue was a second DB named "OLD" in REDHAT_2 node that was trying to sync with "MCT" in all nodes and was taking the ownership of replication slots so the real "MCT" nodes can't really open a connection.根本问题是REDHAT_2节点中名为“OLD”的第二个数据库试图与所有节点中的“MCT”同步,并且正在获取复制槽的所有权,因此真正的“MCT”节点无法真正打开连接。

Taking BDR down in the "OLD" Database and restarting PSQL solved the issue.在“旧”数据库中删除 BDR 并重新启动 PSQL 解决了该问题。 Thanks谢谢

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 PostgreSQL BDR复制问题 - Postgresql BDR Replication issue PostgreSQL + BDR中的复制确认 - Replication acknowledgement in PostgreSQL + BDR PostgreSQL 9.4和BDR SELECT bdr.bdr_group_join错误 - Postgresql 9.4 and BDR SELECT bdr.bdr_group_join ERROR Postgres BDR复制已停止 - 复制槽未激活 - Postgres BDR replication stopped - replication slot inactive PostgreSQL | 剩余的连接槽保留用于非复制超级用户连接 - Postgresql | remaining connection slots are reserved for non-replication superuser connections Postgresql 9.6 级联 stream 复制问题:物理复制槽未从主服务器同步到从服务器 - Postgresql 9.6 cascading stream replication issue: physical replication slots not synced from master to slave 引起:org.postgresql.util.PSQLException:致命:剩余的连接槽是为非复制超级用户连接保留的 - Caused by: org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections AWS RDS PostgreSQL 错误“剩余连接槽为非复制超级用户连接保留” - AWS RDS PostgreSQL error “remaining connection slots are reserved for non-replication superuser connections” Postgres-BDR复制-auto_increment主键问题 - Postgres - BDR replication - problem with auto_increment primary key PostgreSQL和BDR:BDR确实是多主设备吗?它是2019年的1.x版开源和EOL吗? - PostgreSQL & BDR: Is BDR truly multi-master, is it Open Source and EOL for 1.x in 2019?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM