简体   繁体   中英

Transfer large number of large files to s3

I am transferring around 31 TB of data that consists of 4500 files, file sizes range from 69MB to 25GB, from a remote server to a s3 bucket. I am using s4cmd put to do this and put it in a bash script upload.sh :

#!/bin/bash

FILES="/path/to/*.fastq.gz"
for i in $FILES
do
    echo "$i"
    s4cmd put --sync-check -c 10 $i s3://bucket-name/directory/
done

Then I use qsub to submit the job:

qsub -cwd -e error.txt -o output.txt -l h_vmem=10G -l mem_free=8G -l m_mem_free=8G -pe smp 10 upload.sh

This is taking way too long - it took 10 hours to upload ~20 files. Can someone suggest alternatives or modifications to my command?

Thanks!

Your case may belong to the situation when copying the data onto physical media and shipping it by regular mail is faster and cheaper than transferring the data over the internet. AWS supports such a "protocol" and has a special name for it - AWS Snowball .

Snowball is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of the AWS cloud. Using Snowball addresses common challenges with large-scale data transfers including high network costs, long transfer times, and security concerns. Transferring data with Snowball is simple, fast, secure, and can be as little as one-fifth the cost of high-speed Internet.

With Snowball, you don't need to write any code or purchase any hardware to transfer your data. Simply create a job in the AWS Management Console and a Snowball appliance will be automatically shipped to you*. Once it arrives, attach the appliance to your local network, download and run the Snowball client to establish a connection, and then use the client to select the file directories that you want to transfer to the appliance. The client will then encrypt and transfer the files to the appliance at high speed. Once the transfer is complete and the appliance is ready to be returned, the E Ink shipping label will automatically update and you can track the job status via Amazon Simple Notification Service (SNS), text messages, or directly in the Console.

* Snowball is currently available in select regions. Your location will be verified once a job has been created in the AWS Management Console.

The capacity of their smaller device is 50TB, a good fit for your case.

There is also a similar service AWS Import/Export disk , where you ship your own hardware (hard drives), instead of their special device:

To use AWS Import/Export Disk:

  • Prepare a portable storage device (see the Product Details page for supported devices).
  • Submit a Create Job request. You'll get a job ID with a digital signature used to authenticate your device.
  • Print out your pre-paid shipping label.
  • Securely identify and authenticate your device. For Amazon S3, place the signature file on the root directory of your device. For Amazon EBS or Amazon Glacier, tape the signature barcode to the exterior of the device.
  • Attach your pre-paid shipping label to the shipping container and ship your device along with its interface connectors, and power supply to AWS.

When your package arrives, it will be processed and securely transferred to an AWS data center, where your device will be attached to an AWS Import/Export station. After the data load completes, the device will be returned to you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM