简体   繁体   English

如何理解Kademlia节点运行的时间复杂度

[英]How to understand the time complexity of Kademlia node operation

I'm now learning Kademlia.network by reading the classical paper Kademlia: A Peer-to-peer Information System Based on the XOR Metric .我现在正在通过阅读经典论文Kademlia: A Peer-to-peer Information System Based on the XOR Metric 来学习 Kademlia.network。 I want to understand the complexity of its operation but still cannot figure it out.我想了解其操作的复杂性,但仍然无法弄清楚。

In the 3 Sketch of proof section, the paper gives two definitions:3 Sketch of proof部分,论文给出了两个定义:

  1. Depth of a node (h) : 160 − i, where i is the smallest index of a non-empty bucket节点深度 (h) :160 − i,其中 i 是非空桶的最小索引
  2. Node y's bucket height in node x : the index of the bucket into which x would insert y minus the index of x’s least significant empty bucket .节点 y 在节点 x 中的桶高度:x 将插入 y 的桶的索引减去 x 的最低有效空桶的索引。

And three conclusions:以及三个结论:

  1. With overwhelming probability the height of a any given node will be within a constant of log n for a system with n nodes.对于具有 n 个节点的系统,任何给定节点的高度都极有可能在log n的常数范围内。
  2. The bucket height of the closest node to an ID in the kth-closest node will likely be within a constant of log k .在第 k 个最近的节点中与 ID 最接近的节点的桶高度可能在log k的常数范围内。
  3. If none of this node's h most significant k-buckets is empty, the lookup procedure will find a node half as close (or rather whose distance is one bit shorter) in each step, and thus turn up the node in h − log k steps.如果该节点的 h个最重要的 k-buckets都不为空,则查找过程将在每个步骤中找到一个接近一半(或者更确切地说,其距离短一位)的节点,从而在h − log k个步骤中找到该节点.

So my questions are:所以我的问题是:

  1. What is "least significant empty bucket" and "most significant k-buckets" ?什么是“最不重要的空桶”“最重要的 k 桶”
  2. How to explain the depth and bucket height in visual way?如何直观地解释深度铲斗高度
  3. How to understand the second and third conclusions, say, why log k and h - log k ?如何理解第二个和第三个结论,比如说,为什么是log kh - log k

It has been a while since I've actually read the paper, so I'm mostly piecing this together from my implementation experience instead of trying to match the concepts that I have in my head to the formal definitions in the paper, so take the following with a little grain of salt自从我真正阅读这篇论文以来已经有一段时间了,所以我主要是根据我的实现经验将这些拼凑起来,而不是试图将我头脑中的概念与论文中的正式定义相匹配,所以采取加一点盐


What is "least significant empty bucket" and "most significant k-buckets"?什么是“最不重要的空桶”和“最重要的 k 桶”?

That basically refers to the buckets sorted by XOR distance relative to the node's own ID这基本上是指按相对于节点自己的 ID 的 XOR 距离排序的桶

How to explain the depth and bucket height in visual way?如何直观地解释深度和铲斗高度?

Each bucket covers a keyspace region.每个桶覆盖一个键空间区域。 Eg from 0x0000 simplified to 2 bytes to 0x0FFF for 1/16th of the keyspace.例如,从 0x0000简化为 2 个字节到 1/16 的密钥空间的 0x0FFF。 This can be expressed in CIDR-like masks, 0x0/4 (4 prefix bits).这可以用类似 CIDR 的掩码 0x0/4(4 个前缀位)来表示。 That's more or less the depth of a bucket.这或多或少是一个桶的深度。

There are several ways to organize a routing table.有多种方法可以组织路由表。 The "correct" way is to represent it as tree or sorted list based on the lowest ID represented by a bucket. “正确”的方法是将其表示为基于桶表示的最低 ID 的树或排序列表。 This approach allows for arbitrary bucket split operations as some routing table optimizations call for and can also be used to implement node multihoming.这种方法允许任意的桶拆分操作,因为某些路由表优化要求并且也可用于实现节点多宿主。

A simplified implementation may instead use a fixed-length array and put each bucket at the position of shared prefix bits relative to the node's own ID.简化的实现可以改为使用固定长度的数组,并将每个桶放在相对于节点自己的 ID 的共享前缀位的位置。 Ie position 0 in the array will have 0 shared prefix bits, it's the most-distant bucket, the bucket covering 50% of the keyspace and the bucket where the most significant bit is the inverted MSB of the node's own ID.即数组中的位置 0 将有 0 个共享前缀位,它是最远的存储桶,该存储桶覆盖了 50% 的密钥空间,并且最高有效位是节点自己 ID 的倒置 MSB 的存储桶。

In that case the depth is simply the array position.在这种情况下,深度只是阵列位置。

How to understand the second and third conclusions, say, why log k and h - log k?如何理解第二个和第三个结论,比如说,为什么 log k 和 h - log k?

Say you are looking for an ID that is the furthest away from your own node's ID.假设您正在寻找离您自己节点的 ID 最远的 ID。 Then you will only have one bucket covering that part of the keyspace, namely it will cover half the keyspace with the most significant bit differing from yours.那么你将只有一个桶覆盖键空间的那部分,即它将覆盖一半的键空间,其中最重要的位与你的不同。 So you ask one (or several) nodes from that bucket.因此,您从该存储桶中询问一个(或多个)节点。 By virtue of their ID bits having the first bit in common with your lookup target they are more or less guaranteed to have split that in two or more, ie have at least double the keyspace coverage for the target space.由于它们的 ID 位的第一位与您的查找目标相同,因此它们或多或少地保证将其分成两个或多个,即目标空间的密钥空间覆盖范围至少翻倍。 So they can provide at least 1 bit better information.所以他们可以提供至少 1 位更好的信息。

As you query closer nodes to the target they will also have better keyspace coverage near the target region because that's also closer to their own node ID.当您查询距离目标更近的节点时,它们在目标区域附近也会有更好的键空间覆盖范围,因为这也更接近于它们自己的节点 ID。

Rinse, repeat until there are no closer nodes to be found.冲洗,重复直到找不到更近的节点。

Since each hop shaves off at least 1 bit of distance at a time you basically need a O(log(n)) hop count where n is the network size.由于每跳一次至少减少 1 位距离,因此您基本上需要 O(log(n)) 跳数,其中 n 是网络大小。 Since network size basically dictates the average distance between nodes and thus bucket depth needed for your home bucket.由于网络大小基本上决定了节点之间的平均距离,从而决定了您的家庭存储桶所需的存储桶深度。

If the target key is closer to your own ID you will need fewer steps since you will have better coverage of that region of the keyspace.如果目标键更接近您自己的 ID,您将需要更少的步骤,因为您将更好地覆盖该键空间区域。

Since k is a constant (the nodes-per-bucket) so is log k .由于k是一个常数(每个桶的节点数),所以log k是一个常数。 Double the number of nodes in a bucket and it'll have twice the resolution of the given keyspace region and thus will (probabilistically) yield a node that is one bit closer to the target than a bucket with k/2 size.将存储桶中的节点数量加倍,它将具有给定密钥空间区域的两倍分辨率,因此(在概率上)产生的节点比大小为 k/2 的存储桶更接近目标一点。 Ie you have to double the number of entries per bucket for each additional bit per hop you wish to save.即,对于您希望保存的每跳的每个额外位,您必须将每个桶的条目数加倍。


Edit: Here's a visualization of an actual single-homed bittorrent DHT routing table, sorted by their prefixes, ie not relative to the local node ID:编辑:这是实际单宿主 bittorrent DHT 路由表的可视化,按前缀排序,即与本地节点 ID 无关:

Node ID: 2A631C8E 7655EF7B C6E99D8A 3BF810E2 1428BFD4
buckets: 23 / entries: 173
000...   entries:8 replacements:8
00100...   entries:8 replacements:0
0010100...   entries:8 replacements:2
0010101000...   entries:8 replacements:4
00101010010...   entries:8 replacements:7
001010100110000...   entries:8 replacements:3
0010101001100010...   entries:8 replacements:3
00101010011000110000...   entries:8 replacements:1
001010100110001100010...   entries:3 replacements:0
0010101001100011000110...   entries:6 replacements:0
0010101001100011000111...   entries:6 replacements:0
0010101001100011001...   entries:8 replacements:2
001010100110001101...   entries:8 replacements:1
00101010011000111...   entries:8 replacements:2
00101010011001...   entries:7 replacements:0
0010101001101...   entries:8 replacements:0
001010100111...   entries:8 replacements:0
001010101...   entries:8 replacements:1
00101011...   entries:7 replacements:0
001011...   entries:8 replacements:0
0011...   entries:8 replacements:8
01...   entries:8 replacements:8
1...   entries:8 replacements:8

The accepted answer is great!接受的答案很棒!

  1. I think the explanation around h - logk can be simplified.我认为围绕 h - logk 的解释可以简化。 This is how I think about h - log k.这就是我对 h - log k 的看法。

Consider things from a particular node u's perspective.从特定节点u的角度考虑事情。

For your k closest nodes, you have complete information of the neighbourhood.对于您的 k 个最近的节点,您拥有邻域的完整信息。 This is because you are storing items in buckets of size k and there are not enough keys to go in to those buckets so essentially the lower leaves will always have empty k buckets.这是因为您将项目存储在大小为 k 的桶中,并且这些桶中没有足够的 go 键,因此基本上较低的叶子总是有空的 k 桶。 So you will know all your k closest neighbours.所以你会知道你所有的k个最近的邻居。

Now how high is the kth closest neighbour in the tree.现在树中第 k 个最近的邻居有多高。 there is 1 key 1 bit away(final bit differing) there are 2 keys 2 bits away(Last two bits differing) there are 4 keys 3 bits away(Last 3 bits differing) so the height n of the kth closest node is有 1 个键相距 1 位(最后一位不同) 有 2 个键相距 2 位(最后两位不同) 有 4 个键相距 3 位(最后 3 位不同) 所以第 k 个最近节点的高度 n 是

1 + 2 + 4 + ... + 2^n = k
=> 2^n = k + 1
=> n = log(k+1)

So the kth distant node is at height log(k)所以第 k 个远节点的高度为 log(k)

What this tells us is that when you get a search for a node whose distance is <= logk(height of the kth node).We can answer immediately since we know the complete neighbourhood and we don't need to spend logk steps obtaining 1 bit of information in each step as we need to do when the requested node is farther away.这告诉我们的是,当您搜索距离 <= logk(第 k 个节点的高度)的节点时。我们可以立即回答,因为我们知道完整的邻域并且我们不需要花费 logk 个步骤来获得 1当请求的节点距离较远时,我们需要在每个步骤中提供一些信息。

So when we do a search for a node whose depth is h.因此,当我们搜索深度为 h 的节点时。 You will query nodes whose depth decreases by 1 in the worst case till you reach a node for which the requested node's depth is log k and that node can immediately answer the query.您将查询在最坏情况下深度减少 1 的节点,直到您到达请求节点的深度为 log k 的节点并且该节点可以立即回答查询。

2. To mathematically answer your first question, with overwhelming probability the nodes height is O(log n) 2. 要从数学上回答您的第一个问题,节点高度极有可能是 O(log n)

Consider a.network where the keys are M bits and there are N nodes in the.network.考虑一个网络,其中密钥是 M 位,网络中有 N 个节点。 Now we are looking at the routing tree from a particular node u's perspective.现在我们从特定节点 u 的角度来看路由树。 This tree will be crowded towards the higher order bits since 1/2 the possible keys fall in top bucket, 1/4 in second and so on.这棵树将挤向更高阶位,因为 1/2 可能的键落在顶部桶中,1/4 落在第二个桶中,依此类推。

So what is the probability that your first q slots
in the routing tree with distance 
from 2^0 - 2^1 to 2^q-1 - 2^q are empty.
This requires that all the N nodes fall in the buckets greater than q

To select a key in bucket greater than q you
need to ensure that your maximum prefix match is less than M-q.

So there are 2^M total keys of which 2^q keys 
have the same prefix of length (M-q) as the node u. 
So the favourable cases are 2^M - 2^q. 
Total cases are 2^M
Assume all N key draws are independent
So the probability that q lowest slots are empty is (1 - 1/2^(M-q))^N

So we plug in q = M - clog(n) which would mean 
that there are clog(n) filled buckets 
with M-clog(N) lower buckets empty

P = (1-1/2^(clog(N)))^N
  = (1-1/N^c)^N
this is approximately equal to 
1-1/N^(c-1)

And so the probability goes to 1 as c increases and we are very likely to have only clog(n) top slots filled in the routing table.因此,随着 c 的增加,概率变为 1,我们很可能只有 clog(n) 个顶部插槽填充在路由表中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM