简体繁体 English

PostgreSQL + BDR中的复制确认

[英]Replication acknowledgement in PostgreSQL + BDR

原文 2016-02-29 11:47:20 6 1 multithreading/ postgresql/ replicaset/ postgresql-bdr

I'm using libpq C Library for testing PG + BDR replica set. 我正在使用libpq C库测试PG + BDR副本集。 I'd like to get acknowledgement of the CRUD operations' replication. 我想知道CRUD操作的复制。 My purpose is to make my own log of the replication time in milliseconds or if possible in microseconds. 我的目的是创建自己的复制时间日志（以毫秒为单位），如果可能的话，以微秒为单位。

The program: 该程序：
Starts 10-20 threads witch separate connections, each thread makes 1000-5000 cycles of basic CRUD operations on three tables. 启动10-20个线程并建立单独的连接，每个线程在三个表上进行1000-5000个基本CRUD操作周期。

Which would be the best way? 哪种方法最好？
Parsing some high verbosity logs if they have proper data with time stamp or in my C api I should start N thread (N = {number of nodes} - {the master I'm connected to}) after every CRUD op. 解析一些冗长的日志，如果它们带有时间戳，或者在我的C api中有适当的数据，我应该在每次CRUD操作之后启动N个线程（N = {节点数}-{我连接的主机}）。 and query the nodes for the data. 并查询节点以获取数据。

1 个解决方案

You can't get replay confirmation of individual xacts easily. 您无法轻易获得单个Xact的重播确认。 The system keeps track of the log sequence number replayed by peer nodes but not what transaction IDs those correspond to, since it doesn't care. 该系统跟踪对等节点重播的日志序列号，但不会跟踪它们对应的事务ID，因为它不在乎。

What you seem to want is near-synchronous or semi-synchronous replication. 您似乎想要的是近同步或半同步复制。 There's some work coming there for 9.6 that will hopefully benefit BDR in time, but that's well in the future. 9.6方面的一些工作有望使BDR及时受益，但这在将来是很好的。

In the mean time you can see the log sequence number as restart_lsn in pg_replication_slots . 与此同时，你可以看到日志序列号为restart_lsn在pg_replication_slots 。 This is not the position the replica has replayed to, but it's the oldest point it might have to restart replay at after a crash. 这不是副本已重播到的位置，但这是它在崩溃后可能必须重新开始重播的最早点。

You can see the other LSN fields like replay_location only when a replica is connected in pg_stat_replication . 你可以看到其他LSN领域，如replay_location只有当一个副本连接pg_stat_replication 。 Unfortunately in 9.4 there's no easy way to see which slot in pg_replication_slots is associated with which active connection in pg_stat_replication (fixed in 9.5, but BDR is based on 9.4 still). 不幸的是在9.4有没有简单的方法，看看哪一个插槽中pg_replication_slots与其中活动连接相关pg_stat_replication （固定在9.5，但BDR是基于9.4仍然）。 So you have to use the application_name set by BDR if you want to pick out individual nodes, and it's ... "interesting" to parse. 因此，如果要选择单个节点，则必须使用BDR设置的application_name ，它的解析很有趣。 Also often truncated. 也经常被截断。

You can get the current LSN of the server you committed an xact on after committing it by calling SELECT pg_current_xlog_location(); 您可以在提交 xact的服务器上通过调用SELECT pg_current_xlog_location();获得当前LSN SELECT pg_current_xlog_location(); which will return a value like 0/19E0F060 or whatever. 它将返回一个类似于0/19E0F060的值。 You can then look that value up in the pg_stat_replication of peer nodes until you see that the replay_location for the node you committed on has reached or passed the LSN you captured immediately after commit. 然后，您可以在对等节点的pg_stat_replication中查找该值，直到看到replay_location的节点的replay_location已达到或通过提交后立即捕获的LSN。

It's not perfect. 这不是完美的。 There could be other work done between when you commit and when you capture the server's current LSN. 在您提交和捕获服务器的当前LSN之间，可能还有其他工作要做。 There's no way around that, but at worst you wait slightly too long. 没有办法解决，但是最糟糕的是您等待太久。 If you're using BDR you shouldn't be caring about micro or even milliseconds anyway, since it's an asynchronous replication solution. 如果您使用的是BDR，则因为它是一个异步复制解决方案，所以无论如何都不要花费微秒甚至几毫秒的时间。

The principles are pretty similar to measuring replication lag for normal physical standby servers, so I suggest reading some docs on that. 该原理与测量普通物理备用服务器的复制滞后非常相似，因此我建议阅读一些文档。 Except that pg_last_xact_replay_timestamp() won't work for logical replication, so you can't get lag using that, you have to use the LSNs and do your own timing client-side. 除了pg_last_xact_replay_timestamp()不能用于逻辑复制之外，因此您不能因此而滞后，您必须使用LSN并自己在客户端计时。