简体繁体 English

DHT如何运作？

[英]How does DHT work?

原文 2011-10-08 02:51:05 1 1 dht/ rationale

I grabbed the basic idea about DHT from wiki: 我从wiki中抓住了关于DHT的基本想法：

Store Data: 存储数据：

In a DHT-network, every node is responsible for a specific range of key-space . 在DHT网络中，每个节点负责特定范围的key-space 。 To store a file in the DHT, first, hash the file's name to get the file's key ; 要在DHT中存储文件，首先， hash the file's name to get the file's key ; second, send a message put(key, file-content) to any node of the DHT , the message will be relayed to the node which is responsible for key and that node will store the pair (key, file-content) . 第二， send a message put(key, file-content) to any node of the DHT ，该消息将被中继到负责key节点，该节点将存储该对(key, file-content) 。

Get Data: 获取数据：

When getting a file from DHT, first, hash the file's name to get the key ; 从DHT获取文件时，首先，哈希文件的名称以获取key ; second send a message get(key) to any node, relay the message until... 第二次向任何节点发送消息get(key) ，中继消息直到...

Questions: 问题：

To store a file, we can hash the file's name to get its key , but wiki says: 为了存储文件，我们可以散列文件的名称以获取其key ，但维基说：

In the real world the key k could be a hash of a file's content rather than a hash of a file's name to provide content-addressable storage, so that renaming of the file does not prevent users from finding it. 在现实世界中，密钥k可以是文件内容的散列而不是文件名称的散列，以提供内容可寻址存储，因此重命名文件不会阻止用户找到它。

Hash file's content? 哈希文件的内容？ How am I supposed to know the file's content ? 我怎么知道文件的内容 ？ If I've already know the file's content, then WHY would I search it in the DHT? 如果我已经知道文件的内容，那么为什么我会在DHT中搜索它？

According to the wiki, every participating node will spare some space to store files. 根据维基，每个参与节点将节省一些空间来存储文件。 So does it mean that, if I participate in a DHT, I have to spare 10G disk space to store those files whose key falls into the specific key-space I'm responsible for? 那么这是否意味着，如果我参与DHT，我必须spare 10G disk space来存储那些key falls into the specific key-space我负责key falls into the specific key-space文件？
If indeed I should spare some disk space to store those files, then how should I store those (key, file-content) on the disk? 如果确实我应该节省一些磁盘空间来存储这些文件，那么我应该如何在磁盘上存储这些(key, file-content)呢？ I mean, should the file be arranged into a B-tree or something on my disk? 我的意思是，如果文件被安排到我的磁盘上的B-tree或什么？
When a query happens, how does my computer respond? 当查询发生时，我的计算机如何响应？ I assume, first, check the queried key , if in my key-space , then find the corresponding file on my disk. 我假设，首先，检查queried key ，如果在我的key-space ，然后在我的磁盘上找到corresponding file 。 right? 对？

1 个解决方案

A DHT is just an algorithm. DHT只是一种算法。 At its base it provides distributed key-value PUT and GET operations. 在它的基础上，它提供分布式键值PUT和GET操作。 Similar to a normal Map or associative array found in many programming languages. 类似于许多编程语言中的普通Map或关联数组。

Due to the real-world limitations such as untrustworthy nodes, failure rates and so on actual DHT implementations don't provide an arbitrary-length PUT(<uint8[]>, <uint8[]>) operation. 由于诸如不可靠节点之类的现实限制，实际DHT实现的故障率等不提供任意长度的PUT(<uint8[]>, <uint8[]>)操作。

Example: 例：

The kademlia implementation for bittorrent for example provides the following interfaces: 例如，bittorrent的kademlia实现提供了以下接口：

PUT(uint8[20], uint16)
GET(uint8[20]) -> List<Pair<IP, uint16>> where the list only represents a randomly sampled subset of the actual data GET(uint8[20]) -> List<Pair<IP, uint16>>其中列表仅表示实际数据的随机采样子集

As you can see it actually is a specialized asymmetric interface when compared to more generic associative arrays. 正如您所看到的，与更通用的关联数组相比，它实际上是一个专门的非对称接口。 The IP address is always derived from the PUT sender's source address, ie cannot be explicitly set. IP地址始终从PUT发送方的源地址派生，即无法明确设置。 And the GET returns a list instead of a single value, so it implements a MultiMap or Map<List> , if you want to see it like that. 并且GET返回一个列表而不是单个值，因此它实现了MultiMap或Map<List> ，如果你想这样看。

In bittorrent's case a hash is used as content descriptor, where peers which have the content announce themselves on the DHT. 在bittorrent的情况下，散列被用作内容描述符，其中具有内容的对等体在DHT上宣告它们自己。 If someone wants the file(s) they look up IP/Port pairs on the DHT, then contact the peers via a separate protocol and then download the data. 如果有人想要文件，他们会在DHT上查找IP /端口对，然后通过单独的协议联系对等端，然后下载数据。

But other uses for a DHT are also possible, ie they could store signed, structured data, tweet-like text snippets or whatever. 但DHT的其他用途也是可能的，即它们可以存储签名的结构化数据，类似推文的文本片段或其他任何内容。 It always depends on your applications' needs. 它总是取决于您的应用程序的需求。

It's just a basic building block. 它只是一个基本的构建块。