简体   繁体   English

Haskell 中的 SHA1 编码

[英]SHA1 encoding in Haskell

I have a list of filepaths and want all these files to store as sha1 encoded hash in a list again.我有一个文件路径列表,并希望所有这些文件再次作为 sha1 编码的哈希存储在列表中。 It should be as general as possible, so the files could be text as well as binary files.它应该尽可能通用,因此文件可以是文本文件也可以是二进制文件。 And now my questions are:现在我的问题是:

  1. What packages should be used and why?应该使用哪些包,为什么?
  2. How consistent is the approach?该方法的一致性如何? With that I mean: if there could be different results with different programs using sha1 for encoding itself (eg sha1sum)我的意思是:如果使用 sha1 对自身进行编码(例如 sha1sum)的不同程序可能会产生不同的结果

The cryptohash package is probably the simplest to use. cryptohash包可能是最容易使用的。 Just read your input into a lazy 1 ByteString and use the hashlazy function to get a ByteString with the resulting hash.只需将您的输入读入一个惰性1 ByteString 并使用hashlazy函数获取带有结果哈希的 ByteString。 Here's a small sample program which you can use to compare the output with that of sha1sum .这是一个小示例程序,可用于将输出与sha1sum的输出进行比较。

import Crypto.Hash.SHA1 (hashlazy)
import qualified Data.ByteString as Strict
import qualified Data.ByteString.Lazy as Lazy
import System.Process (system)
import Text.Printf (printf)

hashFile :: FilePath -> IO Strict.ByteString
hashFile = fmap hashlazy . Lazy.readFile 

toHex :: Strict.ByteString -> String
toHex bytes = Strict.unpack bytes >>= printf "%02x"

test :: FilePath -> IO ()
test path = do
  hashFile path >>= putStrLn . toHex
  system $ "sha1sum " ++ path
  return ()

Since this reads plain bytes, not characters, there should be no encoding issues and it should always give the same result as sha1sum :由于这读取的是普通字节,而不是字符,因此应该没有编码问题,并且它应该始终给出与sha1sum相同的结果:

> test "/usr/share/dict/words"
d6e483cb67d6de3b8cfe8f4952eb55453bb99116
d6e483cb67d6de3b8cfe8f4952eb55453bb99116  /usr/share/dict/words

This also works for any of the hashes supported by the cryptohash package.这也适用于 cryptohash 包支持的任何哈希。 Just change the import to eg Crypto.Hash.SHA256 to use a different hash.只需将导入更改为例如Crypto.Hash.SHA256即可使用不同的哈希。

1 Using lazy ByteStrings avoids loading the entire file into memory at once, which is important when working with large files. 1使用惰性 ByteString 可避免将整个文件一次加载到内存中,这在处理大文件时很重要。

As to @hammar's answer, it is excellent but you can use Base16 library instead of making your own toHex .至于@hammar 的回答,它非常好,但您可以使用Base16 库而不是制作自己的toHex

import qualified Data.ByteString.Base16 as B16
hashFile path >>= putStrLn . B16.encode

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM