简体   繁体   English

Paxos 和 Cassandra 中的 W+R>=N 有什么区别?

[英]Whats the difference between Paxos and W+R>=N in Cassandra?

Dynamo-like databases (eg Cassandra) can enforce consistency by means of quorum, ie a number of synchronously written replicas (W) and a number of replicas to read (R) should be chosen in such a way that W+R>N where N is a replication factor.类似 Dynamo 的数据库(例如 Cassandra)可以通过仲裁来强制一致性,即应该以 W+R>N 的方式选择多个同步写入的副本 (W) 和多个要读取的副本 (R),其中N是复制因子。 On the other hand, PAXOS-based systems like Zookeeper are also used as a consistent fault-tolerant storage.另一方面,像 Zookeeper 这样的基于 PAXOS 的系统也被用作一致的容错存储。

What is the difference between these two approaches?这两种方法有什么区别? Does PAXOS provide guarantees that are not provided by W+R>N schema? PAXOS 是否提供 W+R>N 模式未提供的保证?

Yes, Paxos provides guarantees that are not provided by the Dynamo-like systems and their read-write quorums. 是的,Paxos提供类似Dynamo的系统及其读写仲裁不提供的保证。 The difference is how failures are handled and what happens during a write. 不同之处在于如何处理故障以及在写入期间发生的情况。 After a successful write, both kind of systems behave similarly. 写入成功后,两种系统的行为都相似。 The data will be saved and available for reading afterwards (until overwritten or deleted) and so on. 数据将被保存并随后可供读取(直到被覆盖或删除),依此类推。

The difference appears during a write and after failures. 写入和失败后会出现差异。 Until you get a successful answer from W nodes when writing something to the eventually consistent systems, then the data may have been written to some nodes and not to others and there is no guarantee that the whole system agrees on the current value. 直到您在向最终一致的系统写入内容时从W节点获得成功答案,然后数据可能已写入某些节点而不是其他节点,并且无法保证整个系统同意当前值。 If you try to read the data back at this point, some clients may get the new data back and some the old data back. 如果此时尝试读回数据,则某些客户端可能会返回新数据并返回一些旧数据。 In other words, the system is not immediately consistent. 换句话说,系统不是立即一致的。 This is because writes aren't atomic across nodes in these systems. 这是因为这些系统中的节点之间的写入不是原子的。 There are usually mechanisms to "heal" an inconsistency like this and "eventually" the system will become consistent again (ie reads will once again always return the same value, until something new is written). 通常有机制来“治愈”这样的不一致性,并且“最终”系统将再次变得一致(即读取将再次始终返回相同的值,直到写入新内容)。 This is the reason why they are often called "eventually consistent". 这就是为什么它们通常被称为“最终一致”的原因。 Inconsistencies can (and will) appear, but they will always be dealt with and reconciled eventually. 可以(并且将会)出现不一致的情况,但最终会对它们进行处理和协调。

With Paxos, writes can be made atomic across nodes and inconsistencies between nodes are therefore possible to avoid. 使用Paxos,可以跨节点使写入成为原子,因此可以避免节点之间的不一致。 The Paxos algorithm makes it possible to guarantee that non-faulty nodes never disagree on the outcome of a write, at any point in time. Paxos算法可以保证非故障节点在任何时间点都不会对写入结果产生不同意见。 Either the write succeeded everywhere or nowhere. 无论是在任何地方还是在任何地方都能成功。 There will never be any inconsistent reads at any point (if it's correctly implemented and if all the assumptions hold, of course). 在任何时候都不会有任何不一致的读取(如果它被正确实现并且当然所有假设都成立)。 This comes at a cost, however. 然而,这需要付出代价。 Mainly, the system may need to delay some requests and be unavailable when for example too many nodes (or the communication between them) aren't working. 主要是,当例如太多节点(或它们之间的通信)不起作用时,系统可能需要延迟一些请求并且不可用。 This is necessary to assure that no inconsistent replies are given. 这是确保没有给出不一致的答复所必需的。

To summarize: the main difference is that the Dynamo-like systems can return inconsistent results during writes or after failures for some time (but will eventually recover from it), whereas Paxos based systems can guarantee that there are never any such inconsistencies by sometimes being unavailable and delaying requests instead. 总结一下:主要区别在于类似Dynamo的系统可以在写入期间或失败后返回不一致的结果一段时间(但最终会从中恢复),而基于Paxos的系统可以保证从来没有任何这样的不一致不可用和延迟请求。

Paxos and the W+R>N quorum try to solve slightly different problems. Paxos和W + R> N法定人数试图解决略有不同的问题。 Paxos is usually described as a way to replicate a state machine, but in fact it is more of a distributed log: each item written to the log gets an index, and the different servers eventually hold the same log items + their index. Paxos通常被描述为复制状态机的一种方式,但实际上它更像是一个分布式日志:写入日志的每个项都获得一个索引,不同的服务器最终拥有相同的日志项+它们的索引。 (Replicated state machine can be achieved by writing to the log the inputs to the state machine and each server replays the state machine on the agreed inputs according to their index). (可以通过将状态机的输入写入日志来实现复制状态机,并且每个服务器根据其索引在约定的输入上重放状态机)。 You can read more about Paxos in a blog post I wrote here . 您可以在我在这里写的博客文章中阅读更多关于Paxos的信息。

The W+R>N quorum solves the problem of sharing a single value among multiple servers. W + R> N仲裁解决了在多个服务器之间共享单个值的问题。 In the academia it is called "shared register". 在学术界,它被称为“共享寄存器”。 A shared register has two operations: read and write, where we expect the read to return the value of the previous write. 共享寄存器有两个操作:读取和写入,我们希望读取返回上一次写入的值。

So, Paxos and the W+R>N quorum live in different domains, and have different properties (eg, Paxos saves an ordered list of items). 因此,Paxos和W + R> N仲裁存在于不同的域中,并且具有不同的属性(例如,Paxos保存有序的项目列表)。 However, Paxos can be used to implement a shared register, and a W+R>N quorum can be used to implement a distributed log (although, very inefficiently). 但是,Paxos可用于实现共享寄存器,W + R> N仲裁可用于实现分布式日志(尽管效率非常低)。

Saying all the above, sometimes the W+R>N quorums aren't implemented in their "fully robust" way, as it will require more than one communication round. 综合以上所述,有时候W + R> N的法定人数没有以“完全强健”的方式实施,因为它需要不止一次通信。 Thus, in systems that want low latency, it is possible that their implementation of W+R>N quorums provide weaker properties (eg, conflicting values can co exist). 因此,在期望低延迟的系统中,它们的W + R> N仲裁的实现可能提供较弱的属性(例如,可以共存存在冲突的值)。

To sum up, theoretically, Paxos and the W+R>N can achieve the same goals. 综上所述,理论上,Paxos和W + R> N可以实现相同的目标。 Practically, it would be very inefficient, and each one is better for something slightly different. 实际上,这将是非常低效的,并且每个对于稍微不同的东西更好。 Even more practically, W+R>N isn't always implemented fully, thus scarifying some consistency properties for speed. 更实际的是,W + R> N并不总是完全实现,因此为速度划分了一些一致性属性。

Update : Paxos supports a very general failure model: messages can be dropped, nodes can crash and restart. 更新 :Paxos支持非常一般的故障模型:消息可能被丢弃,节点可能崩溃并重新启动。 The W+R>N quorum scheme has dfferent implementations, many of which assume less general failures. W + R> N仲裁方案具有不同的实现,其中许多假设不太普遍的失败。 So, the difference between the two also depends on the assumption on the possible failures that are supported. 因此,两者之间的差异还取决于对所支持的可能故障的假设。

Paxos is non-trivial to implement, and expensive enough that many systems using it use hints as well, or use it only for leader election, or something. Paxos实现起来非常重要,而且价格昂贵,许多使用它的系统也使用提示,或仅用于领导者选举等等。 However, it does provide guaranteed consistency in the presence of failures - subject of course to the limits of its particular failure model. 但是,它确实在出现故障时保证了一致性 - 当然要受其特定故障模型的限制。

The first quorum based systems I saw assumed some sort of leader or transaction infrastructure that would ensure enough consistency that you could trust that the quorum mechanism worked. 我看到的第一个基于仲裁的系统假设某种领导者或事务基础结构可确保足够的一致性,您可以信任仲裁机制的工作原理。 This infrastructure might well be Paxos-based. 这个基础设施很可能是基于Paxos的。

Looking at descriptions such as https://cloudant.com/blog/dynamo-and-couchdb-clusters/ , it would appear that Dynamo is not based on an infrastructure that guarantees consistency for its quorum system - so is it being very clever or cutting corners? 看一下https://cloudant.com/blog/dynamo-and-couchdb-clusters/这样的描述,似乎Dynamo 不是基于保证其仲裁系统一致性的基础设施 - 所以它非常聪明或者偷工减料? According to http://muratbuffalo.blogspot.co.uk/2010/11/dynamo-amazons-highly-available-key.html , "The Dynamo system emphasizes availability to the extent of sacrificing consistency. The abstract reads "Dynamo sacrifices consistency under certain failure scenarios". Actually, later it becomes clear that Dynamo sacrifices consistency even in the absence of failures: Dynamo may become inconsistent in the presence of multiple concurrent write requests since the replicas may diverge due to multiple coordinators." 根据http://muratbuffalo.blogspot.co.uk/2010/11/dynamo-amazons-highly-available-key.html,“Dynamo系统强调可用性,以牺牲一致性。摘要读取”Dynamo牺牲一致性实际上,后来很明显Dynamo甚至在没有失败的情况下牺牲了一致性:Dynamo可能在存在多个并发写请求时变得不一致,因为副本可能由于多个协调器而分歧。 (end quote) (结束语)

So, it would appear that in the case of quorums as implemented in Dynamo, Paxos provides stronger reliability guarantees. 因此,似乎在Dynamo中实现的法定数量的情况下,Paxos提供了更强的可靠性保证。

There is no difference. 没有区别。 The definition of a quorum says that any two quorums' intersection is not empty. 法定人数的定义表明任何两个法定人数的交叉点都不是空的。 Simple majority quorum is an example NOT a definition. 简单多数仲裁是一个例子而不是定义。 Take a look at Dr. Lamport's later paper "Vertical Paxos", where he gave some other possible configuration of quorums. 看看Lamport博士后来的论文“Vertical Paxos”,他在那里给出了一些其他可能的法定人数配置。

Multi-decree paxos protocol (AKA Multi-Paxos), in steady state it's just two phase commit. 多法令paxos协议(AKA Multi-Paxos),在稳定状态下它只是两阶段提交。 Ballot number changes are only needed when the leader fails. 只有在领导失败时才需要改变选票号码。

Zookeeper's replication protocol (ZAB) , and RAFT are all based on Paxos. Zookeeper的复制协议(ZAB)和RAFT都基于Paxos。 The differences are in fault-detection and transition after a leader fails. 在领导者失败后,差异在于故障检测和转换。

As mentioned in other answers, in an R+W > N system, there are no transactions so some nodes will have newer values and some older ones.正如其他答案中提到的,在 R+W > N 系统中,没有事务,因此某些节点将具有较新的值和一些较旧的值。 Take an example of a system where n=3, r=2, and w=2.以 n=3、r=2 和 w=2 的系统为例。 For clarity let's assume the 3 nodes are named A, B, and C. Consider this scenario: a write is in progress and node A has been updated with B and C are still in process of receiving the updated value.为清楚起见,我们假设这 3 个节点分别命名为 A、B 和 C。考虑这种情况:写入正在进行,节点 A 已用 B 更新,而 C 仍在接收更新值的过程中。 Clients reading from A and B will see the newer value (resolved using version vectors or last write wins) and clients reading from B and C will see old values.从 A 和 B 读取的客户端将看到较新的值(使用版本向量或上次写入获胜解决),而从 B 和 C 读取的客户端将看到旧值。 This type of read is not considered linearizable.这种类型的读取不被认为是可线性化的。 Such issues will not occur with proper linearizable systems such as Paxos or Raft.使用适当的可线性化系统(例如 Paxos 或 Raft)不会出现此类问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM