简体   繁体   中英

PowerShell 7.0 how to compute hashsum of a big file read in chunks

The script should copy files and compute hash sum of them. My goal is make the function which will read the file once instead of 3 ( read_for_copy + read_for_hash + read_for_another_copy ) to minimize network load. So I tried read a chunk of file then compute md5 hash sum and write out file to several places. The file`s size may vary from 100 MB up to 2 TB and maybe more. There is no need to check files identity at this moment, just need to compute hash sum for initial files.

And I am stuck with respect to computing hash sum:

    $ifile = "C:\Users\User\Desktop\inputfile"
    $ofile = "C:\Users\User\Desktop\outputfile_1"
    $ofile2 = "C:\Users\User\Desktop\outputfile_2"
    
    $md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
    $bufferSize = 10mb
    $stream = [System.IO.File]::OpenRead($ifile)
    $makenew = [System.IO.File]::OpenWrite($ofile)
    $makenew2 = [System.IO.File]::OpenWrite($ofile2)
    $buffer = new-object Byte[] $bufferSize
    
    while ( $stream.Position -lt $stream.Length ) {
       
     $bytesRead = $stream.Read($buffer, 0, $bufferSize)
     $makenew.Write($buffer, 0, $bytesread) 
     $makenew2.Write($buffer, 0, $bytesread) 
    
     # I am stuck here
     $hash = [System.BitConverter]::ToString($md5.ComputeHash($buffer)) -replace "-",""      
            
            }
    
    $stream.Close()
    $makenew.Close()
    $makenew2.Close()

How I can collect chunks of data to compute the hash of whole file?

And extra question: is it possible to calculate hash and write data out in parallel mode? Especially taking into account that workflow {parallel{}} does not supported from PS version 6 ?

Many thanks

If you want to handle input buffering manually, you need to use the TransformBlock / TransformFinalBlock methods exposed by $md5 :

while($bytesRead = $stream.Read($buffer, 0, $bufferSize))
{
    # Write to file copies
    $makenew.Write($buffer, 0, $bytesread) 
    $makenew2.Write($buffer, 0, $bytesread)

    # Feed next chunk to MD5 CSP
    $null = $md5.TransformBlock($buffer, 0 , $bytesRead, $null, 0)
}

# Complete the hashing routine
$md5.TransformFinalBlock([byte[]]::new(0), 0, 0)

# Grab hash value from CSP
$hash = [BitConverter]::ToString($md5.Hash).Replace('-','')

My goal is make the function which will read the file once instead of 3 ( read_for_copy + read_for_hash + read_for_another_copy ) to minimize network load

I'm not entirely sure what you mean by network load here. If the source file is on a remote file share, but the new copies go onto a local file system, you can minimize network load by simply copying the source file once, then use that one copy as the source of the second copy and the hash calculation:

$ifile = "\\remoteMachine\c$\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"
    
# Copy remote -> local
Copy-Item -Path $ifile -Destination $ofile
# Copy local -> local
Copy-Item -Path $ofile -Destination $ofile2

# Hash local file stream
$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$stream = [System.IO.File]::OpenRead($ofile)
$hash = [BitConverter]::ToString($md5.ComputeHash($stream)).Replace('-','')

FWIW, passing the file stream object to $md5.ComputeHash($stream) directly is likely going to be faster than manually buffering the input

Final listing

$ifile = "C:\Users\User\Desktop\inputfile"
$ofile = "C:\Users\User\Desktop\outputfile_1"
$ofile2 = "C:\Users\User\Desktop\outputfile_2"

$md5 = new-object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$bufferSize = 1mb
$stream = [System.IO.File]::OpenRead($ifile)
$makenew = [System.IO.File]::OpenWrite($ofile)
$makenew2 = [System.IO.File]::OpenWrite($ofile2)
$buffer = new-object Byte[] $bufferSize

while ( $stream.Position -lt $stream.Length ) 
{
     $bytesRead = $stream.Read($buffer, 0, $bufferSize)
     $makenew.Write($buffer, 0, $bytesread) 
     $makenew2.Write($buffer, 0, $bytesread) 
    
     $hash = $md5.TransformBlock($buffer, 0 , $bytesRead, $null , 0)  
} 

$md5.TransformFinalBlock([byte[]]::new(0), 0, 0)
$hash = [BitConverter]::ToString($md5.Hash).Replace('-','')      
$hash
$stream.Flush()
$stream.Close()
$makenew.Flush()
$makenew.Close()
$makenew2.Flush()
$makenew2.Close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM