简体繁体中英

How to upload large files from HDFS to S3

原文 2016-01-07 20:29:27 8 2 hadoop/ amazon-web-services/ amazon-s3/ hdfs

I have an issue while uploading a large file (larger than 5GB) from HDFS to S3. Is there a way to upload the file directly from HDFS to S3 without downloading it to the local file system and using multipart ?

2 answers

For copying data between HDFS and S3, you should use s3DistCp . s3DistCp is optimized for AWS and does an efficient copy of large number of files in parallel across S3 buckets.

For usage of s3DistCp , you can refer the document here: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html

The code for s3DistCp is available here: https://github.com/libin/s3distcp

If you are using Hadoop 2.7.1 or later, use the s3a:// filesystem to talk to S3. It supports multi-part uploads, which is what you need here.

Update: September 2016

I should add that we are reworking the S3A output stream work for Hadoop 2.8; the current one buffers multipart uploads in the Heap, and falls over when you are generating bulk data faster than your network can push to s3.

How to get files from HDFS to S3

How to copy files from HDFS to S3 effectively programatically

Copy and extract files from s3 to HDFS

Hadoop server connection for copying files from HDFS to AWS S3

Can distcp be used to copy a directory of files from S3 to HDFS?

How do I copy files from S3 to Amazon EMR HDFS?

How to import data from aws s3 to HDFS with Hadoop MapReduce

How to do I run encrypted distcp from hdfs to s3?

How to uncompress file while loading from HDFS to S3?

How to speed up retrieval of a large number of small files from HDFS

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to get files from HDFS to S3 How to copy files from HDFS to S3 effectively programatically Copy and extract files from s3 to HDFS Hadoop server connection for copying files from HDFS to AWS S3 Can distcp be used to copy a directory of files from S3 to HDFS? How do I copy files from S3 to Amazon EMR HDFS? How to import data from aws s3 to HDFS with Hadoop MapReduce How to do I run encrypted distcp from hdfs to s3? How to uncompress file while loading from HDFS to S3? How to speed up retrieval of a large number of small files from HDFS

Related Tags

How to upload large files from HDFS to S3

Question

2 answers

solution1
3 ACCPTED 2016-01-08 03:46:26

solution2
3 2016-01-08 16:15:29

How to upload large files from HDFS to S3

Question

2 answers

solution1 3 ACCPTED 2016-01-08 03:46:26

solution2 3 2016-01-08 16:15:29

solution1
3 ACCPTED 2016-01-08 03:46:26

solution2
3 2016-01-08 16:15:29