简体繁体 English

Postgres 数据库的多区域可用性能够为我们的客户提供 4+9 的可用性？

[英]Multi-Region Availability for Postgres Database to be able to offer our clients 4+ 9's of availability?

原文 2022-09-16 14:50:43 3 1 postgresql/ google-cloud-platform/ google-cloud-sql/ high-availability

We use Google Cloud SQL, and when deployed in High Availability mode, it offers an SLA of 99.95%.我们使用 Google Cloud SQL，在高可用性模式下部署时，它提供 99.95% 的 SLA。 Google offers its high availability still within the same region, just in a different zone. Google 仍然在同一区域内提供其高可用性，只是在不同的区域内。 We have clients who are asking for an SLA of 99.99%, and are willing to pay for 99.999%.我们有客户要求 99.99% 的 SLA，并愿意支付 99.999% 的费用。

The only thing we've thought of is creating a read replica in another region that we could fail over to if there is such an outage.我们唯一想到的是在另一个区域创建一个只读副本，如果发生这种中断，我们可以故障转移到该区域。 Such a fail over would be manual though.不过，这样的故障转移将是手动的。 We would have to 1. Take down our primary database.我们必须 1. 关闭我们的主数据库。 2. Promote the read replica 3. Redeploy our servers, changing the environment variable for our database. 2. 提升只读副本 3. 重新部署我们的服务器，更改我们数据库的环境变量。

With everything being so manual, it seems difficult to be able to offer this as part of our SLA, as such a change would take ~30 minutes.由于一切都是如此手动，似乎很难将其作为我们 SLA 的一部分提供，因为这样的更改需要大约 30 分钟。 We would need to setup something to be automatic.我们需要设置一些自动的东西。

Surely this isn't a common problem.当然，这不是一个普遍的问题。 There has to be a better way to achieve higher availability for postgres.必须有更好的方法来实现 postgres 的更高可用性。 What do other companies do?其他公司是做什么的？

1 个解决方案

I have been managing PostgreSQL High Availability cluster.我一直在管理 PostgreSQL 高可用性集群。 We did failover test on VMWare on premise with multi region deployment.我们在 VMWare 上进行了多区域部署的故障转移测试。 You need below steps for failover:您需要以下步骤进行故障转移：

Promote standy server which is there in DR region using pg_promote使用pg_promote DR 区域中的备用服务器
Run CHECKPOINT on promoted done, so that other nodes who will join this newly promoted nodes gets upto date information.在promoted done 上运行CHECKPOINT ，以便将加入这个新提升节点的其他节点获得最新信息。
Join other cluster nodes with this newly promoted node using pg_rewind for diff syncing only.使用pg_rewind将其他集群节点加入这个新提升的节点，仅用于差异同步。 If you deploy new node, then need to use pg_basebackup to join this new node.如果部署新节点，则需要使用pg_basebackup加入这个新节点。

Bonus TIP: If replication slot is supported, then you should create replication slot for every other cluster nodes as soon as you perform step 1.额外提示：如果支持复制槽，则应在执行步骤 1 后立即为每个其他集群节点创建复制槽。

All above steps need to perform as a sidecar monitoring using some tool以上所有步骤都需要使用一些工具作为 sidecar 监控执行