简体   繁体   English

无法创建 torrent 的信息哈希

[英]Unable to create a torrent's info hash

I'm having trouble finding the issue with how I'm generating the corresponding info hash for a torrent file.我无法找到如何为 Torrent 文件生成相应信息哈希的问题。 This is the code I have so far:这是我到目前为止的代码:

InputStream input = null;
try {
    MessageDigest sha1 = MessageDigest.getInstance("SHA-1");
    input = new FileInputStream(file);
    StringBuilder builder = new StringBuilder();
    while (!builder.toString().endsWith("4:info")) {
       builder.append((char) input.read()); // It's ASCII anyway.
    }
    ByteArrayOutputStream output = new ByteArrayOutputStream();
    for (int data; (data = input.read()) > -1; output.write(data));
    sha1.update(output.toByteArray(), 0, output.size() - 1);
    this.infoHash = sha1.digest();
    System.out.println(new String(Hex.encodeHex(infoHash)));
} catch (NoSuchAlgorithmException | IOException e) {
     e.printStackTrace();
} finally {
    if (input != null) try { input.close(); } catch (IOException ignore) {}
}

Below is my expected and actual hash:以下是我的预期和实际哈希:

Expected: d4d44272ee5f5bf887a9c85ad09ae957bc55f89d
Actual: 4d753474429d817b80ff9e0c441ca660ec5d2450

The torrent I'm trying to generate an info hash for can be found here (Ubuntu 14.04 Desktop amd64) .我正在尝试为其生成信息哈希的 torrent 可以在这里找到(Ubuntu 14.04 Desktop amd64)

Let me know if I can provide any more info, thanks!如果我能提供更多信息,请告诉我,谢谢!

Exceptions contain 4 useful bits of info: Type, Message, Trace, and Cause.异常包含 4 个有用的信息位:类型、消息、跟踪和原因。 You've tossing away 3 out of the 4 relevant bits of info.您已经丢弃了 4 个相关信息中的 3 个。 Also, code is part of a process, and when an error occurs, generally that process cannot be finished at all.此外,代码是进程的一部分,当发生错误时,通常该进程根本无法完成。 And yet on exceptions your process continues.然而,在例外情况下,您的过程会继续。 Stop doing this;停止这样做; you've written code that only hurts you.你写的代码只会伤害你。 Remove the try, and the catch.删除尝试和捕获。 Add a throws clause on your method signature.在方法签名上添加throws子句。 If you can't, the go-to default (and update your IDE if that generated this code to do this) is throw new RuntimeException("Unhandled", e);如果你不能,那么默认的(如果生成了这个代码来更新你的 IDE)是throw new RuntimeException("Unhandled", e); . . This is shorter, does not destroy any of the 4 interesting bits of info, and ends a process.这更短,不会破坏 4 个有趣的信息位中的任何一个,并结束一个过程。

Separately, the notion that the right way to handle an inputstream close method's IOException being: Just ignore it, is also false.另外,处理输入流close方法的IOException的正确方法是:忽略它的概念也是错误的。 It is highly unlikely to throw, but if it does, you should assume you didn't read every byte.抛出的可能性很小,但如果抛出,您应该假设您没有读取每个字节。 As that would be one explanation for a mismatched hash, it's misguided.由于这将是对不匹配哈希的一种解释,因此被误导了。

Finally, use the proper language constructs: There is a try-with-resources statement that would work far better here.最后,使用正确的语言结构:这里有一个 try-with-resources 语句,效果会更好。

You're calling update with output.size() - 1 ;您正在使用output.size() - 1调用更新; unless you want to intentionally ignore the last byte, this is a mistake;除非你想故意忽略最后一个字节,否则这是一个错误; you're lopping off the last byte read.您正在删除读取的最后一个字节。

Reading bytes into a builder, and then per byte converting the builder to a string and then checking the last character is incredibly inefficient;将字节读入构建器,然后按字节将构建器转换为字符串,然后检查最后一个字符的效率非常低; for a file as small as 1MB that'll cause quite a grind.对于小到 1MB 的文件,这会造成相当大的麻烦。

Reading a single byte at a time from a raw FileInputStream is also that level of inefficient, because every read will cause file access (reading 1 byte is as expensive as reading a whole buffer full, so, it's about 50000 times slower than it needs to be).从原始FileInputStream一次读取一个字节也是低效级别,因为每次读取都会导致文件访问(读取 1 个字节与读取整个缓冲区一样昂贵,因此,它比需要的速度慢约 50000 倍是)。

Here's how to do this with somewhat newer API, and look how much nicer this code reads.下面是如何使用更新的 API 来实现这一点,看看这段代码读起来有多好。 It also acts better under erroneous conditions:它在错误条件下也表现得更好:

byte[] data = Files.readAllBytes(Paths.get(fileName));
var search = "4:info".getBytes(StandardCharsets.US_ASCII);
int searchIdx = -1;
for (int i = 0; searchIdx == -1 && i < data.length - search.length; i++) {
    for (int j = 0; j < search.length; j++) {
        if (data[i + j] != search[j]) break;
        if (j == search.length - 1) searchIdx = i + j;
    }
}
if (searchIdx == -1) throw new IOException("Input torrent file does not contain marker");

var sha1 = MessageDigest.getInstance("SHA-1");
sha1.update(data, searchIdx, data.length - searchIdx);
byte[] hash = sha1.digest();
StringBuilder hex = new StringBuilder();
for (byte h : hash) hex.append(String.format("%02x", h));
System.out.println(hex);

While rzwitserloot's answer covers some general java coding practices there also are correctness issues on the bittorrent level.虽然rzwitserloot 的回答涵盖了一些一般的 Java 编码实践,但在 bittorrent 级别上也存在正确性问题。

You are using string processing for a structured data format, this is pretty much the same mistake as attempting to parse html with regex .您正在对结构化数据格式使用字符串处理,这与尝试使用 regex 解析 html 的错误几乎相同。 In this case you're assuming that the only place that the data can contain the string 4:info is the top-level dictionary key for the info dict and that the info dictionary is the last entry of the top level dictionary.在这种情况下,您假设数据可以包含字符串4:info的唯一位置是 info dict 的顶级字典键,并且 info 字典是顶级字典的最后一个条目。

Instead you should use a proper bencoding decoder-encoder to extract the info dict and then re-encode it for hashing or a tokenizer to find the exact byte-range covering the info value.相反,您应该使用适当的编码解码器-编码器来提取信息字典,然后重新编码它以进行散列或标记器以找到覆盖信息值的确切字节范围。 Note that you need a validating parser for the former while the latter can also handle some out-of-spec edge cases.请注意,前者需要一个验证解析器,而后者也可以处理一些超出规范的边缘情况。 Unless you want to implement them yourself you may want to find a library that handles this for you.除非你想自己实现它们,否则你可能想找到一个为你处理这个的库。

Additionally you're assuming that the data is ASCII.此外,您假设数据是 ASCII。 bencoding is in fact a binary format that just tends to use ascii by convention in some places. bencoding 实际上是一种二进制格式,在某些地方习惯于使用 ascii。 You should operate on byte arrays directly.您应该直接对字节数组进行操作。 Your input is already binary, the hasher expects binary so it is quite circuitous to go through strings.您的输入已经是二进制的,散列器需要二进制,因此遍历字符串非常迂回。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM