简体   繁体   中英

Maximal SHA-1 Hash Performance Tips in Java

I'm writing a Java library that needs to compute SHA-1 hashes. During a common task, the JVM spends about 70% of its time in sun.security.provider.SHA.implCompress , 10% in java.util.zip.Inflater.inflate , and 2% in sun.security.provider.ByteArrayAccess.b2iBig64 . (According to NetBeans profiler.)

I can't seem to get the Google search keywords right to get relevant results. I'm not very familiar with the SHA-1 hash algorithm. How can I get the most performance out of an SHA-1 MessageDigest ? Is there a certain chunk size I should be digesting, or multiples of certain sizes I should try?

To answer some questions you're thinking about asking:

  • Yes, I'm digesting as I read the files ( MessageDigest.update ), so bytes are only digested once.
  • The SHA-1 digests are being used as checksums, usually for files that need to be zlib/inflated.
  • No, I can't use a different hash.
  • Yes, I know zlib already uses checksums, but external requirements specify the use of SHA-1 hashes on top of that. I can't come up with a good reason why (+1 if you can) :-)

也许你可以调用用C编写的本机代码。必须有大量超级优化的SHA1库。

SHA-1 has a block size of 64 bytes, so multiples of that are probably best; otherwise the implementation will need to copy partial blocks into buffers.

Are you running on a multi-core computer? You could run the zlib decompression and SHA-1 hashing in separate threads, using something like java.util.concurrent.SynchronousQueue to hand off each decompressed 64-byte block from the one thread to the other. That way you can have one core hashing one block while another core is decompressing the next block.

(You could try one of the other BlockingQueue implementations that has some storage capacity, but I don't think it'd help much. The decompression is much faster than the hashing, so the zlib thread would quickly fill up the queue and then it'd have to wait to put each new block, just like with the SynchronousQueue .)

I know you said you've optimized I/O already, but are you using asynchronous I/O? For maximum performance you don't want to hash one block and then ask the OS to read the next block, you want to ask the OS to read the next block and then hash the one you already have while the disk is busy fetching the next one. However, the OS probably does some readahead already, so this may not make a big difference.

But beyond all that, a cryptographic hash function is a complex thing; it's just going to take time to run. Maybe you need a faster computer. :-)

Have you tried switching the file processing to a Memory Mapped file? Performance for those tends to be significantly faster than regular IO and NIO.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM