简体   繁体   中英

Piping Amazon S3 download stream through Node.Js spawn causing incomplete downloads

I'm currently trying to download an encrypted file from Amazon S3 and pipe it through GPG decryption which I'm spawning. I'm using the aws-sdk for node ( https://github.com/aws/aws-sdk-js ).

The download usually works, but on a slower network connection (which I'm testing by throttling my network), the file download will hang for a few seconds and the file will be truncated (sizes don't match up). The file is also now corrupt. I believe the problem is coming somewhere in the spawn, as it might not be closing or finishing the response correctly due to the slower stream.

My code to download the file:

let params = {Bucket: Bucket, Key: Key};
let readStream = s3.getObject(params).createReadStream();
gpg.decrypt(readStream, res); // res is express response object`

My code to decrypt the file, and then pipe it to the response:

gpg.decrypt = function(inputStream, res) {
    let cp = require('child_process');
    let decryptionArgs = ['--decrypt', '--batch', '--yes', '--no-tty', '--passphrase', 'mypassphrase'];
    let gpg = cp.spawn('gpg', decryptionArgs);
    inputStream.on('error', (err) => res.status(500).json(err));
    gpg.on('error', (err) => res.status(500).json(err));
    inputStream.pipe(gpg.stdin);
    gpg.stdout.pipe(res);
}

I'm setting the Content-Type to application/octet-stream and the Content-Disposition to attachment; filename="thefilename" attachment; filename="thefilename"

UPDATE: I figured out the issue, in case this helps someone. On a slower network connection (tested by throttling my network), gpg.stdout would become unpiped. I tested this by setting an event listener for the unpipe event. I was able to solve this issue by using buffers instead. I still pipe the input file stream to my gpg spawn, but instead of piping the output to the response, I'm writing to the response chunk by chunk:

gpg.decrypt = function(inputStream, res) {
    let cp = require('child_process');
    let decryptionArgs = ['--decrypt', '--batch', '--yes', '--no-tty', '--passphrase', 'mypassphrase'];
    let gpg = cp.spawn('gpg', decryptionArgs);
    inputStream.on('error', (err) => res.status(500).json(err));
    gpg.on('error', (err) => res.status(500).json(err));

    gpg.on('close', () => res.end());
    gpg.stdout.on('data', (chunk) => res.write(chunk));
    inputStream.pipe(gpg.stdin);
}

I am no JS expert, but have done quite a bit of low-level raw HTTP work, and here's the problem I see.

inputStream.on('error', (err) => res.status(500) ...

This seems like it would only be able to (correctly) handle errors that occurred very early in the process of fetching the file.

If an error occurred later, once the output has started, going to be too late, since the http response has already started, 200 OK has already been sent to the client. Once you have started streaming a response... you see the dilemma, here.

I see no obvious + simple way to simultaneously stream a response and handle a late error gracefully, and depending on how the environment handles this, the JSON might even be making its way into the tail of the response... changing a truncated file into a truncated file with noise at the end.

But if you know the expected byte length of the output after decompression, it seems like you should be able to set the Content-Length response header. When your transfer fails and the output is truncated, the user agent (browser, curl, etc.) should eventually realize that the content is too short and throw an error... there's no way to go back in time to do much else, and the only other option would be not to use streaming -- download, decrypt, verify, return... but that may not be viable for you either.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM