简体   繁体   中英

Getting odd results from Node.js crypto library

I'm trying to write a function to recursively get the md5sum of all the files in a directory but I'm getting a different result each time it's run without any files being modified.

Code I'm getting these results from:

var crypto = require('crypto');
var fs = require('fs');
var path = require('path');

function _deepMD5(dir, md5){
    var files = fs.readdirSync(dir);
    for(var i = 0; i < files.length; i++){
        var fp = dir+path.sep+files[i];
        if(fs.lstatSync(fp).isDirectory()){
            _deepMD5(fp, md5);
        }
        else{
            var fh = fs.openSync(fp, 'r');
            var chunkSize=1024;
            var buffer=new Buffer(chunkSize, 'binary');
            while(fs.readSync(fh, buffer, 0, chunkSize, null) != 0){
                md5.update(buffer);
            }
        }
    }
}

function deepMD5(dir){
    var md5sum = crypto.createHash('md5');
    _deepMD5(dir, md5sum);
    return md5sum.digest('hex');
}

console.log(deepMD5("."));

When you create a new buffer, it is not cleared. So you start with a randomly-filled buffer - this is where the changes between runs come from.

Next, you read 1024 bytes and update the hash with that. However, that read actually reads up to 1024 bytes. It returns the number of bytes actually read. You're going to want to be aware of that. Otherwise, every time you go through a file not divisible by 1024 bytes, you update the hash with extra stuff at the end (something random if its the first read, or something leftover from the previous read.

So whenever you read less than chunkSize bytes, you want to slice off the bytes that are actually from the most recent read and pass that buffer to update :

var length;
while((length = fs.readSync(fh, buffer, 0, chunkSize, null)) != 0){
    if(length == chunkSize)
        md5.update(buffer);
    else
        md5.update(buffer.slice(0, length));
}

For efficiency's sake, I've avoided slicing when we don't have to. Of course, you could just slice every time if you prefer the shorter code and aren't worried about performance.

There is a very small change required in your code to have it working as expected: you need to reset your new buffer after creation using buffer.fill(0) .

The new Buffer() expression allocates memory but does not clean it so you have to do it manually. If all your files were over 1024 bytes of size, you probably would not notice that issue. But if there is at least one file with size less than 1024 bytes, the issue is very likely to happen.

Corrected _deepMD5 function:

function _deepMD5(dir, md5){
    var files = fs.readdirSync(dir);
    console.info("running with files: ", files)
    for(var i = 0; i < files.length; i++){
        var fp = dir+path.sep+files[i];
        if(fs.lstatSync(fp).isDirectory()){
            _deepMD5(fp, md5);
        }
        else{
            var fh = fs.openSync(fp, 'r');
            var chunkSize=1024;
            var buffer=new Buffer(chunkSize, 'binary');
            buffer.fill(0) // that will fix the issue.
            while(fs.readSync(fh, buffer, 0, chunkSize, null) != 0){
                md5.update(buffer);
            }
        }
    }
}

I hope that will help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM