简体   繁体   English

PHP:为大于 2GB 的文件计算文件 HASH

[英]PHP: Calculating File HASH for Files Larger than 2GB

Would you advise please, how to calculate file HASH on files larger than 2GB in PHP?请您指教,如何在 PHP 中对大于 2GB 的文件计算文件 HASH?

The only PHP function known to me is:我唯一知道的 PHP 函数是:

string hash_file ( string $algo , string $filename [, bool $raw_output = false ] )

This function however has a limitation.然而,这个功能有一个限制。 It returns HASH for files smaller than 2GB.它为小于 2GB 的文件返回 HASH。 For larger files, hash_file() throws error.对于较大的文件, hash_file()会引发错误。

Here are some constraints/requests:以下是一些限制/要求:

  • should work on Linux Ubuntu 64bit server应该在 Linux Ubuntu 64 位服务器上工作
  • compatible with PHP 5+兼容 PHP 5+
  • there should be no file size limit应该没有文件大小限制
  • should be as fast as possible应该尽可能快

This is all the information I have now.这就是我现在掌握的所有信息。 Thank you very much.非常感谢。


UPDATE更新

I have a solution that is more practical and efficient than any hash calculation from data >2GB.我有一个比数据 > 2GB 的任何哈希计算更实用、更有效的解决方案。

I have realized, that I do not have to generate hash from complete files that are over 2GB.我已经意识到,我不必从超过 2GB 的完整文件中生成哈希。 To uniquely identify any file, calculating hash from say first 10KB of data of any file should be sufficient.要唯一标识任何文件,从任何文件的前 10KB 数据计算哈希应该就足够了。 Moreover, it will be faster than >2GB calculation.此外,它会比> 2GB 的计算速度更快。 In other words, ability to calculate hash from a data string that is over 2GB probably is not necessary at all.换句话说,从超过 2GB 的数据字符串计算散列的能力可能根本没有必要。

I will wait for your reactions.我会等待你的反应。 In couple of days, I will close this question.几天后,我将关闭这个问题。

I would use exec() to run a local hashing function in the shell and return the value back to the php script.我会使用exec()在 shell 中运行本地散列函数并将值返回给 php 脚本。 Here's an example with md5 but any algo available can be used.这是一个使用md5的示例,但可以使用任何可用的算法。

  $results = array();
  $filename = '/full/path/to/file';
  exec("md5sum $filename", $results);

Then parse the result array (the output of the shell command).然后解析结果数组(shell 命令的输出)。

In general, I like to avoid doing anything directly in PHP that requires more than 1G of memory, especially if running in php-fpm or as an apache module--sort of time reinforced prejudice.一般来说,我喜欢避免直接在 PHP 中做任何需要超过 1G 内存的事情,尤其是在 php-fpm 或作为 apache 模块运行时——时间会强化偏见。 This is definitely my advice when there is a native application that can accomplish the goal and you don't particularly need portablitly cross platform (like run on both linux and windows machines).当有一个可以实现目标的本机应用程序并且您并不特别需要可移植的跨平台(例如在 linux 和 windows 机器上运行)时,这绝对是我的建议。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM