简体繁体 English

什么是CouchDB复制协议？是不是像Git？

[英]What is the CouchDB replication protocol? Is it like Git?

原文 2011-01-22 05:42:10 0 5 git/ couchdb

Is there technical documentation describing how replication between two Couches works? 是否有技术文档描述两个Couches之间的复制是如何工作的？

What is the basic overview of CouchDB replication? CouchDB复制的基本概述是什么？ What are some noteworthy characteristics about it? 有什么值得注意的特点呢？

5 个解决方案

Unfortunately there is no detailed documentation describing the replication protocol. 遗憾的是，没有详细的文档描述复制协议。 There is only the reference implementation built into CouchDB, and Filipe Manana's rewrite of the same which will probably become the new implmentation in the future. CouchDB中只有参考实现，Filipe Manana对其进行了重写，这可能会成为未来的新实现。

However, this is the general idea: 但是，这是一般的想法：

Key points 关键点

If you know Git, then you know how Couch replication works. 如果您了解Git，那么您就知道Couch复制的工作原理。 Replicating is very similar to pushing or pulling with distributed source managers like Git. 复制与使用像Git这样的分布式源管理器推送或拉动非常相似。

CouchDB replication does not have its own protocol. CouchDB复制没有自己的协议。 A replicator simply connects to two DBs as a client, then reads from one and writes to the other. 复制器只是作为客户端连接到两个DB，然后从一个读取并写入另一个。 Push replication is reading the local data and updating the remote DB; 推送复制是读取本地数据并更新远程数据库; pull replication is vice versa. 拉复制反之亦然。

Fun fact 1 : The replicator is actually an independent Erlang application, in its own process. 有趣的事实1 ：复制器实际上是一个独立的Erlang应用程序，在它自己的过程中。 It connects to both couches, then reads records from one and writes them to the other. 它连接到两个沙发，然后从一个读取记录并将它们写入另一个。
Fun fact 2 : CouchDB has no way of knowing who is a normal client and who is a replicator (let alone whether the replication is push or pull). 有趣的事实2 ：CouchDB 无法知道谁是普通客户端以及谁是复制者（更不用说复制是推还是拉）。 It all looks like client connections. 这看起来像客户端连接。 Some of them read records. 其中一些人读了记录。 Some of them write records. 他们中的一些人写了记录。

Everything flows from the data model 一切都来自数据模型

The replication algorithm is trivial, uninteresting. 复制算法是微不足道的，无趣的。 A trained monkey could design it. 受过训练的猴子可以设计它。 It's simple because the cleverness is the data model, which has these useful characteristics: 它很简单，因为聪明才是数据模型，它具有以下有用的特性：

Every record in CouchDB is completely independent of all others. CouchDB中的每条记录都完全独立于其他记录。 That sucks if you want to do a JOIN or a transaction, but it's awesome if you want to write a replicator. 如果你想做一个JOIN或一个事务，这很糟糕，但如果你想写一个复制器，它真棒。 Just figure out how to replicate one record, and then repeat that for each record. 只需弄清楚如何复制一条记录，然后为每条记录重复一次。
Like Git, records have a linked-list revision history. 与Git一样，记录具有链表修订历史记录。 A record's revision ID is the checksum of its own data. 记录的修订ID是其自身数据的校验和。 Subsequent revision IDs are checksums of: the new data, plus the revision ID of the previous. 后续修订ID是以下校验和：新数据，以及前一个的修订ID。
In addition to application data ( {"name": "Jason", "awesome": true} ), every record stores the evolutionary timeline of all previous revision IDs leading up to itself. 除了应用程序数据（ {"name": "Jason", "awesome": true} ）之外，每条记录都存储了所有先前修订版ID的演化时间轴。
- Exercise : Take a moment of quiet reflection. 练习：花一点时间静静思考。 Consider any two different records, A and B. If A's revision ID appears in B's timeline, then B definitely evolved from A. Now consider Git's fast-forward merges. 考虑任何两个不同的记录，A和B.如果A的修订ID出现在B的时间轴中，那么B肯定是从A演变而来。现在考虑Git的快进合并。 Do you hear that? 你听到了吗？ That is the sound of your mind being blown. 这就是你的心灵被吹的声音。
Git isn't really a linear list. Git不是一个真正的线性列表。 It has forks, when one parent has multiple children. 当一方父母有多个孩子时，它有分叉。 CouchDB has that too. CouchDB也有。
- Exercise : Compare two different records, A and B. A's revision ID does not appear in B's timeline; 练习：比较两个不同的记录，A和B. A的修订ID不会出现在B的时间轴中; however, one revision ID, C, is in both A's and B's timeline. 然而，一个修订ID，C，是在两个 A和B的时间线。 Thus A didn't evolve from B. B didn't evolve from A. But rather, A and B have a common ancestor C. In Git, that is a "fork." 因此A不是从B.进化而来的.B不是从A演化而来的。而是，A和B有一个共同的祖先C.在Git中，这是一个“分叉”。 In CouchDB, it's a "conflict." 在CouchDB中，这是一场“冲突”。
- In Git, if both children go on to develop their timelines independently, that's cool. 在Git，如果两个孩子都继续独立开发他们的时间表，这很酷。 Forks totally support that. 福克斯完全支持这一点。
- In CouchDB, if both children go on to develop their timelines independently, that cool too. 在CouchDB中，如果两个孩子继续独立开发他们的时间表，那也很酷。 Conflicts totally support that. 冲突完全支持这一点。
- Fun fact 3: CouchDB "conflicts" do not correspond to Git "conflicts." 有趣的事实3： CouchDB“冲突”与Git“冲突”不对应。 A Couch conflict is a divergent revision history, what Git calls a "fork." Couch冲突是一个不同的修订历史，Git称之为“分叉”。 For this reason the CouchDB community pronounces "conflict" with a silent n : "co-flicked." 由于这个原因，CouchDB的社会宣告“冲突”与无声N：“共同轻弹”。
Git also has merges, when one child has multiple parents. 当一个孩子有多个父母时，Git也有合并。 CouchDB sort of has that too. CouchDB的那种具有太多。
- In the data model, there is no merge. 在数据模型中，没有合并。 The client simply marks one timeline as deleted and continues to work with the only extant timeline. 客户端只是将一个时间轴标记为已删除，并继续使用唯一的现有时间轴。
- In the application, it feels like a merge. 在应用程序中，感觉就像合并。 Typically, the client merges the data from each timeline in an application-specific way. 通常，客户端以特定于应用程序的方式合并来自每个时间轴的数据。 Then it writes the new data to the timeline. 然后它将新数据写入时间线。 In Git, this is like copying and pasting the changes from branch A into branch B, then commiting to branch B and deleting branch A. The data was merged, but there was no git merge . 在Git中，这就像复制并粘贴从分支A到分支B的更改，然后提交到分支B并删除分支A. 数据已合并，但没有git merge 。
- These behaviors are different because, in Git, the timeline itself is important; 这些行为是不同的，因为在Git中，时间轴本身很重要; but in CouchDB, the data is important and the timeline is incidental—it's just there to support replication. 但是在CouchDB中，数据非常重要，而且时间轴是偶然的 - 它只是支持复制。 That is one reason why CouchDB's built-in revisioning is inappropriate for storing revision data like a wiki page. 这就是为什么CouchDB的内置修订不适合存储维基页面等修订数据的原因之一。

Final notes 最后的笔记

At least one sentence in this writeup (possibly this one) is complete BS. 本文中至少有一个句子（可能是这一个）是完整的BS。

Thanks Jason for the excellent overview! 感谢杰森的出色概述！ Jens Alfke, who is working on TouchDB and its replication for Couchbase, has (unofficially) described the CouchDB replication algorithm itself if you're interested in the technical details of how a "standard" CouchDB replicator protocol tends to work. 正在研究TouchDB及其Couchbase复制的Jens Alfke已经（非正式地）描述了CouchDB复制算法本身，如果您对“标准”CouchDB复制器协议如何工作的技术细节感兴趣。

To summarize the steps he's outlined: 总结他概述的步骤：

Figure out how far any previous replication got 弄清楚以前的复制有多远
Get the source database _changes since that point _changes起获取源数据库_changes
Use revs_diff on a batch of changes to see which are needed on the target 对一批更改使用revs_diff以查看目标上需要哪些更改
Copy any missing revision metadata and current document data+attachments from source to target, posting to bulk_docs both for optimization and so as to store the docs differently than the usual higher-level MVCC handling does on PUT . 将任何缺少的修订元数据和当前文档数据+附件从源复制到目标，发布到bulk_docs进行优化，以便以不同于PUT上通常的更高级别MVCC处理的方式存储文档。

I've glossed over many details here, and would recommend reading through the original explanation as well. 我在这里略过了许多细节，并建议阅读原始解释。

The documentation for CouchDB v2.0.0 covers the replication algorithm much more extensively. CouchDB v2.0.0的文档更广泛地介绍了复制算法。 They have diagrams, example intermediate responses, and example errors. 它们具有图表，示例中间响应和示例错误。 They use the "MUST", "SHALL", etc. language of IETF RFCs. 他们使用IETF RFC的“MUST”，“SHALL”等语言。

The specifics for 2.0.0 (still unreleased as of January 2016) are a bit different from 1.x, but the basics are still as @natevw described . 2.0.0的细节（截至2016年1月仍未发布）与1.x略有不同，但基本仍然是@natevw描述的。

At Apache CouchDB Conf 2013 , Benjamin Young introduced replication.io in his Replication, FTW! 在Apache CouchDB Conf 2013上，Benjamin Young在他的Replication，FTW中引入了replication.io ！ talk . 说说 It's an ongoing effort to define, and eventually mint, the spec for HTTP-based master-master replication. 这是为基于HTTP的主 - 主复制定义并最终制定规范的持续努力。