简体   繁体   中英

Which built-in Postgres replication fits my Django-based use-case best?

I've noticed that Postgres now has built-in replication, including synchronous replication, streaming replication and some other variants. it even provides the ability to control synchrony for specific operations at the application-leve (eg, use synchronous for important stuff like money transfers, but maybe don't for less critical things like user comments, etc.)

I'm working on a software using Django 1.5 (ie, dev) and will possibly need synchronous replication (will have commerce related transactions going on).

Do you think that the built-in tools are best for the job in most cases, and do you have any thoughts on one variant of the built-in replication vs another, ease of use related, quality, etc.?

One final thing; Slony and PGPool II seem to be pretty popular (Slony, in particular) for replication. Is there A) a particular, technical reason for their popularity over built-in replication or B) is it just because a lot of people are using versions that don't have built-in replication, or C) have I been under a rock and PG built-in replication is already quite popular?

Update (more details)

I have only 2 physical servers, and they're located in the same rack. My intention is to provide a slave which can automatically turn into the master, should something go catastophically wrong in one machine (or even something simple like double power supply failure, etc.). I don't mind if my clients experience downtime during an automatic failover, so long as the downtime is a few minutes or so, not an hour or something.

I would like for zero data loss, and am willing to sacrifice more time in the failover process for that. Is there a way to make this trade off without going for synchronous replication (eg, streaming logs without write back confirmation or something)?

What strategy or variant of replication would you recommend?

I think you misunderstand the benefits and cost of synchronous commits on replication. In PostgreSQL, replication works by recovering slaves up to the master, using the standard crash recovery features of PostgreSQL. In the event that, for example, the power goes out, you can be sure that the write-ahead log segments will be run on both master and slave. With asynchronous commit, the commit is written to the WAL, the application is notified and the slave is notified more or less all at the same time depending on network characteristics, etc.

Where synchronous commit comes in handy is where two things are true:

  1. You have more than one slave (this is critical!) and
  2. You need durability guarantees that asynchronous commits can't offer you.

With synchronous commit, the database waits until it hears back from a configurable number of slaves to tell the application that the commit has happened. This offers durability guarantees in a few cases where asynchronous commits are unable to work.

For example, suppose your master server takes a bullet through a raid array and immediately crashes (sorry, I couldn't think of any better examples with good hardware). Or suppose someone trips on a power cord and not only powers off the server but corrupts the software RAID device. In this case it is possible that a couple of transactions may not have been replicated and your WAL is unrecoverable, so those transactions are lost. With synchronized commit, the application would have waited until durability guarantees were met.

One thing this means is that if you do synchronous commit with only one slave your availability cannot outlast a crash on either master or slave, so your availability will be worse than it would have been with just one server. It also means that if your slave is geographically removed, that you introduce significant latency in your application's request to commit transactions.

Switching from async to sync commit it not a big change, but generally, I think that you get the most out of sync commit when you have already done as much as you can assurance and availability-wise on your hardware already. Start with async and move up when you can't further optimize your setup as async.

Re: "Slony and PGPool II seem to be pretty popular (Slony, in particular) for replication. Is there A) a particular, technical reason for their popularity over built-in replication or B) is it just because a lot of people are using versions that don't have built-in replication, or C) have I been under a rock and PG built-in replication is already quite popular?"

Slony is popular because it has been around for quite a long time, and the built-in PostgreSQL replication is relatively new. Cascading replication built in to PostgreSQL is even newer, and is something that Slony-I was built with.

There are two main advantages to Slony-I, first, you can replicate between differing versions of PostgreSQL, whereas the built-in replication system not only must use the same version, but the two servers must also be binary compatible. The other advantage is that you can replicate only certain tables on Slony-I instead of the whole database cluster. The disadvantages of Slony-I are numerous, and include poor user-friendliness, no synchronous commits, and difficult DDL (schema) changes. I believe that use of the built-in replication in Postgres will quickly exceed the Slony-I user base if it hasn't already done so.

As far as I remember, PGPool II uses statement-based replication (like what MySQL has had built-in), and is definitely the least desirable, in my opinion.

I would use the built-in hot standby/streaming replication in PostgreSQL. You can start with synchronous commit turned on and turn it off if you don't need it or the penalty is too high, or vice versa. Over a LAN, asynchronous mode seems to reach the slave in the order of a hundred milliseconds or so (from my own experience).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM