[英]Very slow to generate MD5 for large file using Java
I am using Java to generate the MD5 hash for some files. 我正在使用Java为某些文件生成MD5哈希。 I need to generate one MD5 for several files with a total size of about 1 gigabyte.
我需要为几个文件生成一个MD5,总大小约为1千兆字节。 Here's my code:
这是我的代码:
private String generateMD5(SequenceInputStream inputStream){
if(inputStream==null){
return null;
}
MessageDigest md;
try {
int read =0;
byte[] buf = new byte[2048];
md = MessageDigest.getInstance("MD5");
while((read = inputStream.read(buf))>0){
md.update(buf,0,read);
}
byte[] hashValue = md.digest();
return new String(hashValue);
} catch (NoSuchAlgorithmException e) {
return null;
} catch (IOException e) {
return null;
}finally{
try {
if(inputStream!=null)inputStream.close();
} catch (IOException e) {
// ...
}
}
} }
This seems to run forever. 这似乎永远存在。 How can I make it more efficient?
我怎样才能提高效率?
You may want to use the Fast MD5 library. 您可能想要使用Fast MD5库。 It's much faster than Java's built-in MD5 provider and getting a hash is as simple as:
它比Java的内置MD5提供程序快得多,并且获取哈希就像下面这样简单:
String hash = MD5.asHex(MD5.getHash(new File(filename)));
Be aware that the slow speed may also be due to slow File I/O. 请注意,速度慢也可能是由于文件I / O较慢。
I rewrite your code with nio, the code is somewhat like below: 我用nio重写你的代码,代码有点像下面:
private static String generateMD5(FileInputStream inputStream){
if(inputStream==null){
return null;
}
MessageDigest md;
try {
md = MessageDigest.getInstance("MD5");
FileChannel channel = inputStream.getChannel();
ByteBuffer buff = ByteBuffer.allocate(2048);
while(channel.read(buff) != -1)
{
buff.flip();
md.update(buff);
buff.clear();
}
byte[] hashValue = md.digest();
return new String(hashValue);
}
catch (NoSuchAlgorithmException e)
{
return null;
}
catch (IOException e)
{
return null;
}
finally
{
try {
if(inputStream!=null)inputStream.close();
} catch (IOException e) {
}
}
}
On my machine, it takes about 30s to generate md5 code for a large file, and of course i test your code as well, the result indicates that nio doesn't improve the performance of the program. 在我的机器上,为大文件生成md5代码大约需要30秒,当然我也测试你的代码,结果表明nio不会提高程序的性能。
Then, i try to get the time for io and md5 respectively, the statistics indicates that the slow file io is the bottleneck because about 5/6 of time is taken for io. 然后,我试图分别获得io和md5的时间,统计数据表明慢文件io是瓶颈,因为大约有5/6的时间用于io。
By using the Fast MD5 library mentioned by @Sticky, it takes only 15s to generate md5 code, the improvement is remarkable. 通过使用@Sticky提到的Fast MD5库,生成md5代码只需15秒,这一改进非常显着。
Whenever speed is an issue and you download a file from a URL and want to calculate its MD5 at the same time (ie not save the file, reopen and read again just to get its MD5), my solution at https://stackoverflow.com/a/11189634/1082681 might be helpful. 每当速度成为问题并且您从URL下载文件并想要同时计算其MD5(即不保存文件,重新打开并再次读取以获取其MD5)时,我的解决方案位于https:// stackoverflow。 com / a / 11189634/1082681可能会有所帮助。 It is based on Bloodwulf's code snippet here in this thread (thanks!) and just extends it a bit.
它基于Bloodwulf的代码片段在这个帖子中(谢谢!)并且只是扩展了一下。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.