简体   繁体   中英

How to copy large amount of files from S3 folder to another

I'm trying to move large amount of files(around 300Kb max size each file) from S3 folder to another.

I'm using AWS sdk for java, and tried to move around 1500 files.

it took too much time, and the number of files may be increase to 10,000.

for each copy of file, need to delete from the source folder as there is no method to move file.

this what i tried:

public void moveFiles(String fromKey, String toKey) {
    Stream<S3ObjectSummary> objectSummeriesStream = this.getObjectSummeries(fromKey);
    objectSummeriesStream.forEach(file ->
        {
            this.s3Bean.copyObject(bucketName, file.getKey(), bucketName, toKey);
            this.s3Bean.deleteObject(bucketName, file.getKey());
        });

}

private Stream<S3ObjectSummary> getObjectSummeries(String key) {

    // get the files that their prefix is "key" (can be consider as Folders).
    ListObjectsRequest listObjectsRequest = new ListObjectsRequest().withBucketName(this.bucketName)
        .withPrefix(key);
    ObjectListing outFilesList = this.s3Bean.listObjects(listObjectsRequest);
    return outFilesList.getObjectSummaries()
        .stream()
        .filter(x -> !x.getKey()
            .equals(key));
}

If you are using Java application you can try to use several threads to copy files:

private ExecutorService executorService = Executors.fixed(20);

public void moveFiles(String fromKey, String toKey) {
    Stream<S3ObjectSummary> objectSummeriesStream = 
    this.getObjectSummeries(fromKey);
    objectSummeriesStream.forEach(file ->
    {
        executorService.submit(() ->
            this.s3Bean.copyObject(bucketName, file.getKey(), bucketName, toKey);
            this.s3Bean.deleteObject(bucketName, file.getKey());
        )};
    });

}

This should speed up the process.

An alternative might be using AWS-lambda. Once the file appear in source bucket you can, for example, put event in the SQS FIFO queue. The lambda will start single file copy by this event. If I am not mistaken in parallel you can start up to 500 instances of lambdas. Should be fast.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM