简体繁体 English

pg_restore 后引导 bucardo 复制

[英]Bootstrap bucardo replication after pg_restore

原文 2020-07-30 13:22:42 9 1 postgresql/ replication/ pg-dump/ pg-restore/ bucardo

Currently I am setting up Master/Master Replication with bucardo between 5 Nodes on different locations (should provide location transparency).目前，我正在不同位置的 5 个节点之间使用 bucardo 设置主/主复制（应该提供位置透明性）。 The database holds ~500 Tables which should be replicated.该数据库包含大约 500 个应该被复制的表。 I grouped them into smaller replication herds of 50 Tables at maximum based on their dependency on each other.我根据它们彼此的依赖关系，将它们分成最多 50 个表的较小的复制群。 All tables have primary keys defined and the sequencers on each node are set up to provide system wide unique identities (based on residue class)所有表都定义了主键，并且每个节点上的排序器都设置为提供系统范围的唯一身份（基于残基类）

To get an initial database on each node, I made a --data-only custom format pg_dump into a File and restored this on each node via pg_restore .为了在每个节点上获得一个初始数据库，我将--data-only自定义格式 pg_dump 制作成一个文件，并通过pg_restore在每个节点上恢复它。 Bucardo sync is setup with the bucardo_latest strategy to resolve conflicts. Bucardo 同步使用bucardo_latest策略设置以解决冲突。 Now when I start syncing bucardo is deleting all datasets in the origin database first and inserting it again from one of the restored nodes, because all restored datasets have a "later timestamp" (the point in time when I called pg_restore).现在，当我开始同步 bucardo 时，首先删除原始数据库中的所有数据集，然后从其中一个恢复的节点再次插入它，因为所有恢复的数据集都有一个“稍后的时间戳”（我调用 pg_restore 的时间点）。 This ultimately prohibits the inital startup as bucardo needs very much time and also fails, as there are lots of datasets to solve and timeouts often too short.这最终会阻止初始启动，因为 bucardo 需要非常多的时间并且也会失败，因为有很多数据集需要解决并且超时通常太短。

I also have 'last_modified' timestamps on each table which are managed by UPDATE triggers, but as I understand it, pg_dump inserts data via COPY, and therefore these triggers don't get fired.我在每个表上也有“last_modified”时间戳，由 UPDATE 触发器管理，但据我了解，pg_dump 通过 COPY 插入数据，因此这些触发器不会被触发。

Which timestamp does bucardo use to find out who is bucardo_latest ? bucardo 使用哪个时间戳来找出谁是bucardo_latest ？
Do I have to call pg_dump with something like set SESSION_REPLICATION_ROLE = 'replica';我是否必须使用set SESSION_REPLICATION_ROLE = 'replica';类的方法调用pg_dump ？ ? ?

I just want bucardo to keep track of every new change, not executing pseudo changes because of the restore.我只想让 bucardo 跟踪每个新更改，而不是因为还原而执行伪更改。

EDIT: pg_restore has definitely fired several triggers at restore time...as said I keep track on user and last modification date in each table, and those values are set to the user and timestamp when the restore was done.编辑：pg_restore 在恢复时肯定触发了几个触发器......正如我所说的，我在每个表中跟踪用户和最后修改日期，并且这些值在恢复完成时设置为用户和时间戳。 I am aware, that I can set SESSION_REPLICATION_ROLE for a plain text format restore via psql .我知道，我可以设置 SESSION_REPLICATION_ROLE 以通过psql恢复纯文本格式。 Is this also possible for pg_restore somehow?这对 pg_restore 也有可能吗？

1 个解决方案

The common approach is make the dump/restore process before configure the replication. 常见的方法是在配置复制之前进行转储/恢复过程。

So an option will be:所以一个选项将是：

drop the bucardo schema in each database在每个数据库中删除bucardo模式
do a bucardo remove for each object (most of them allow use all , like bucardo remove table all为每个 object 执行一次bucardo remove （其中大多数允许all使用，例如bucardo remove table all
dump/restore your data转储/恢复您的数据
Configure again the replication.再次配置复制。 Just make sure that when adding the sync, set the option onetimecopy=0 .只需确保在添加同步时设置选项onetimecopy=0 。 It's the default but I feel safer making it explicit.这是默认设置，但我觉得明确表示更安全。

Which timestamp does bucardo use to find out who is bucardo_latest? bucardo 使用哪个时间戳来找出谁是 bucardo_latest？

bucardo handles its own timestamp value. bucardo处理自己的时间戳值。 Each table should have a trigger named like bucardo.delta_myschema_mytable that makes and insert in a table named like bucardo.delta_myschema_mytable .每个表都应该有一个名为bucardo.delta_myschema_mytable的触发器，该触发器创建并插入名为bucardo.delta_myschema_mytable的表中。 This table has a column txntime timestamp with time zone not null default now() and this is the timestamp used.该表有一列txntime timestamp with time zone not null default now() ，这是使用的时间戳。

Do I have to call pg_dump with something like set SESSION_REPLICATION_ROLE = 'replica';?我是否必须使用 set SESSION_REPLICATION_ROLE = 'replica'; 之类的东西调用 pg_dump？

AFAIK, if bucardo triggers are already set in the tables, the option --disable-triggers of pg_restore should do the trick. AFAIK，如果已经在表中设置了bucardo触发器，那么pg_restore的选项--disable-triggers应该可以解决问题。

You can also check these articles about working with large databases and the use of session_replication_role您还可以查看有关使用大型数据库和使用session_replication_role的这些文章