简体繁体 English

为什么使用no-op填补paxos事件之间的空白是合法的？

[英]Why is it legit to use no-op to fill gaps between paxos events?

原文 2015-04-14 04:27:21 4 2 algorithm/ cloud/ distributed-computing/ paxos/ consensus

I am learning Paxos algorithm ( http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf ) and there is one point I do not understand. 我正在学习Paxos算法（ http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf ），有一点我不明白。

We know that events follow a timely order, and it happens when, say, events 1-5 and 10 are decided, but 6-9 and 11 thereafter are not yet. 我们知道事件遵循及时的顺序，并且当事件1-5和10被确定时发生，但此后6-9和11还没有。 In the paper above, it says we simply fill in the gap between 6-9 with no-op values, and simply record new events from 11 and on. 在上面的论文中，它说我们只需用无操作值填写6-9之间的差距，然后简单地记录11和11之间的新事件。

So in this case, since event 10 is already recorded, we know some kinds of events must have happened between 5 and 10 but are not recorded by Paxos due to some failures. 因此，在这种情况下，由于已经记录了事件10，我们知道某些事件必须在5到10之间发生，但由于某些故障而未被Paxos记录。 If we simply fill in no-op values, these events will lost in our recording. 如果我们只是填写无操作值，这些事件将在我们的录音中丢失。

Even worse, if, as the paper I linked above says, events are in fact commands from the client, then missing a few commands in the middle might make the entire set of operations illegal (if none of the commands can be skipped or the order of them matters). 更糟糕的是，如果我在上面链接的论文中说，事件实际上是来自客户端的命令，那么在中间丢失一些命令可能会使整个操作集合非法（如果没有任何命令可以跳过或命令他们很重要）。

So why is it still legit for Paxos to fill no-op values for gaps between events? 那么，为什么Paxos为事件之间的差距填补无操作价值仍然是合法的呢？ (If the entire set of records might be invalid because of no-op values as I concerned above.) Also, is there a better way to recover from such gaps instead of using no-op? （如果整个记录集可能因为我上面提到的无操作值而无效。）另外，有没有更好的方法从这些间隙中恢复而不是使用no-op？

2 个解决方案

This is a multi-part answer. 这是一个多部分的答案。

Proposing no-op values is the way to discover commands that haven't got to the node yet. 提出无操作值是发现尚未到达节点的命令的方法 。 We don't simply fill each slot in the gap with a no-op command: we propose each slot is filled with a no-op. 我们不是简单地用no-op命令填充间隙中的每个槽：我们建议每个槽都填充一个no-op。 If any of the peers have accepted a command already, it will return that command in the Prepare-ack message and the proposer will use that command in the Accept round instead of the no-op. 如果任何对等体已经接受了命令，它将在Prepare-ack消息中返回该命令，并且提议者将在Accept轮次中使用该命令而不是no-op。

For example, assume a node was behind a temporary network partition and was unable to play with the others for slots 6-9. 例如，假设一个节点位于临时网络分区后面，并且无法与插槽6-9中的其他节点一起玩。 It knows it missed out upon learning the command in slot 10. It then proposes no-ops to learn what was decided in those slots. 它知道它在第10个插槽中学习命令时错过了。然后它建议no-ops来学习那些插槽中的决定。

Practical implementations also have an out-of-band learning protocol to learn lots of transitions in bulk. 实际的实现还有一个带外学习协议，可以批量学习大量的转换。

A command isn't a command until it is fully decided ; 命令在完全决定之前不是命令; until then it is just a proposed command. 在那之前它只是一个提议的命令。 Paxos is about choosing between contending commands from multiple clients. Paxos是关于在多个客户端的竞争命令之间进行选择。 Clients must be prepared to have their commands rejected because another client's was chosen instead. 客户必须准备好拒绝他们的命令，因为选择了另一个客户端。

Practical implementations are all about choosing the order of client commands. 实际的实现都是关于选择客户端命令的顺序。 Their world view is that of a write-ahead log, and they are placing the commands in that log. 他们的世界观是预写日志，他们将命令放在该日志中。 They retry in the next slot if they're command wasn't chosen. 如果没有选择命令，他们会在下一个插槽中重试。 (There are many ways to reduce the contention; Lamport mentions forwarding requests to a leader, such as is done in Multi-Paxos.) （有很多方法可以减少争用; Lamport提到向领导者转发请求，例如在Multi-Paxos中完成的。）

Practical systems also have some means to know if the command is invalid before proposing it; 在提出命令之前，实际系统还有一些方法可以知道命令是否无效; such as knowing a set of reads and a set of writes. 例如知道一组读取和一组写入。 This is important for two reasons. 这有两个重要原因。 First, it's an asynchronous, multi-client system and anything could have changed by the time the client's command has reached the server. 首先，它是一个异步的多客户端系统，当客户端的命令到达服务器时，任何事情都可能发生变化。 Second, if two concurrent commands do not conflict then both should be able to succeed. 其次，如果两个并发命令不冲突，那么两者都应该能够成功。

The system model allows commands (messages) to be lost by the network anyway. 无论如何，系统模型允许网络丢失命令（消息）。 If a message is lost, the client is expected to eventually retry the request; 如果消息丢失，则客户端最终将重试该请求; so it is fine to drop some of them. 所以可以放下一些。 If the commands of a client have to executed in client order, then either the client only sends commands synchronously; 如果客户端的命令必须按客户端顺序执行，则客户端只能同步发送命令; or the commands have to be ordered at a higher level in the library and kept in some client-session object before being executed. 或者命令必须在库中的更高级别进行排序，并在执行之前保存在某个客户端会话对象中。