How to zip large files in s3 and then download it

Question

greetings, I use this medium because I currently have a problem: I need to download files from s3 through a lambda function, all compressed in a .zip file.

The code I show below works when the generated file weighs a maximum of 5GB, recently I tried to download 60 assets of 500MB each and the generated file only had 7 correct files, the other files are shown as damaged files.

The algorithm does its job, but I think the fact that as they are stream files the lambda runs out of memory also influences, and what comes to mind is to divide everything into parts and chunks, but so far I have not found something that works for me, has this happened to someone? Please help

const archiver = require('archiver');
const aws = require('aws-sdk');
const stream = require('stream');
const REGION = debug ? 'myRegion' : process.env.REGION;
const FOLDER_LOCATION = debug ? 'myDownloads' : process.env.FOLDER_LOCATION;
const io = require('socket.io-client');
const s3 = new aws.S3({ apiVersion: '2006-03-01', region: REGION });
const { API_DEV, API_QA, API_PRO } = require('./constants');
let isSocketConencted = false;
let socket;

const main = async (uuid, asset, nameZip, client, arrayObjects, channel, env) => {
  const api = env === 'dev'
    ? API_DEV
    : env === 'qa'
      ? API_QA
      : API_PRO;

  socket = io(api, { path: '/sockets' });
  socket.on('connect', () => {
    console.log('socket conectado');
    isSocketConencted = true;
  });

  socket.on('disconnect', () => {
    console.log('socket desconectado');
    isSocketConencted = false;
  });

  const bkt = env === 'dev'
    ? 'bunkey-develop'
    : env === 'qa'
      ? 'bunkey-qa'
      : 'bunkey-prod';

  const s3DownloadStreams = arrayObjects.map(o => {
    const [folder, fullName] = o.url.split('/').slice(o.url.split('/').length - 2);
    const fileName = fullName.split('.')[0];
    const ext = fullName.split('.')[1];
    return {
      stream: s3.getObject({ Bucket: bkt, Key: `${folder}/${fileName}.${ext}` }).createReadStream(),
      filename: `${o.name}.${ext}`,
    };
  });

  const streamPassThrough = new stream.PassThrough();
  const params = {
    ACL: 'public-read',
    Body: streamPassThrough,
    Bucket: bkt,
    ContentType: 'application/zip',
    Key: `${FOLDER_LOCATION}/${nameZip.replace(/\//g, '-')}.zip`,
    StorageClass: 'STANDARD_IA',
  };

  const s3Upload = s3.upload(params, error => {
    if (error) {
      console.error(`Got error creating stream to s3 ${error.name} ${error.message} ${error.stack}`);
      throw error;
    }
  });

  const archive = archiver('zip', {
    gzip: true,
    zlib: {
      level: 9,
    }
  });

  archive.on('error', error => {
    throw new Error(`${error.name} ${error.code} ${error.message} ${error.path} ${error.stack}`);
  });

  new Promise((resolve, reject) => {
    s3Upload.on('close', resolve);
    s3Upload.on('end', resolve);
    s3Upload.on('error', reject);

    archive.pipe(streamPassThrough);
    s3DownloadStreams.forEach(streamDetails => archive.append(streamDetails.stream, { name: streamDetails.filename }));

    archive.finalize();
  }).catch(async error => {
    await handleSocketEmit(env, { uuid, channel, status: 'error', message: error.message });
    throw new Error(`${error.message}`);
  });

  const result = await s3Upload.promise();
  if (result && result.Location) {
    await handleSocketEmit(env, { uuid, asset, status: 'success', client, nameZip, channel, url: result.Location });
    await handleSocketDestroy();
    return { statusCode: 200, body: result.Location };
  } else {
    await handleSocketEmit(env, { uuid, channel, status: 'error' });
    await handleSocketDestroy();
    return { statusCode: 500 };
  }
};

const handleSocketDestroy = async () => {
  socket.close();
  socket.destroy();
};

const handleSocketEmit = async (env, msg) => {
  try {
    if (isSocketConencted) {
      socket.emit('request_lambda_download', msg);
    } else {
      setTimeout(async () => {
        await handleSocketEmit(env, msg);
      }, 1000);
    }
  } catch (error) {
    console.log('handleSocketEmit.err: ', error);
  }
};

exports.handler = async (event) => {
  const { uuid, asset, nameZip, client, arrayObjects, channel, env } = event;
  const result = await main(uuid, asset, nameZip, client, arrayObjects, channel, env);
  return result;
};

Answer 1

It appears your requirements is to download uncompressed objects from Amazon S3, create a zip and then upload the zip back to Amazon S3.

Your problems appear to be stemming from the fact that the Lambda function is streaming content and manipulating it in memory, rather than disk. AWS Lambda functions only have 512MB of disk space allocated, which can make it difficult to manipulate potentially large files.

If you wish to keep using AWS Lambda to do this work, then I would recommend:

Create an Amazon EFS filesystem
Attach the EFS filesystem to the Lambda function
Modify your Lambda function to download all files to EFS
The Lambda function can then create a zip of the local files and upload it to Amazon S3

This avoids all the streaming and memory requirements. It can actually run with much lower (minimum?) memory settings, which means the Lambda function runs at a much lower cost.

How to zip large files in s3 and then download it

Question

1 answers

solution1
1 2020-10-02 03:15:05

How to zip large files in s3 and then download it

Question

1 answers

solution1 1 2020-10-02 03:15:05

solution1
1 2020-10-02 03:15:05