[英]How to copy large amount of files from S3 folder to another
I'm trying to move large amount of files(around 300Kb max size each file) from S3 folder to another. 我正在尝试将大量文件(每个文件最大大小300Kb)从S3文件夹移动到另一个文件夹。
I'm using AWS sdk for java, and tried to move around 1500 files. 我正在使用AWS sdk for java,并尝试移动1500个文件。
it took too much time, and the number of files may be increase to 10,000. 花了太多时间,文件数量可能会增加到10,000。
for each copy of file, need to delete from the source folder as there is no method to move file. 对于每个文件副本,需要从源文件夹中删除,因为没有移动文件的方法。
this what i tried: 这是我试过的:
public void moveFiles(String fromKey, String toKey) {
Stream<S3ObjectSummary> objectSummeriesStream = this.getObjectSummeries(fromKey);
objectSummeriesStream.forEach(file ->
{
this.s3Bean.copyObject(bucketName, file.getKey(), bucketName, toKey);
this.s3Bean.deleteObject(bucketName, file.getKey());
});
}
private Stream<S3ObjectSummary> getObjectSummeries(String key) {
// get the files that their prefix is "key" (can be consider as Folders).
ListObjectsRequest listObjectsRequest = new ListObjectsRequest().withBucketName(this.bucketName)
.withPrefix(key);
ObjectListing outFilesList = this.s3Bean.listObjects(listObjectsRequest);
return outFilesList.getObjectSummaries()
.stream()
.filter(x -> !x.getKey()
.equals(key));
}
If you are using Java application you can try to use several threads to copy files: 如果您使用的是Java应用程序,则可以尝试使用多个线程来复制文件:
private ExecutorService executorService = Executors.fixed(20);
public void moveFiles(String fromKey, String toKey) {
Stream<S3ObjectSummary> objectSummeriesStream =
this.getObjectSummeries(fromKey);
objectSummeriesStream.forEach(file ->
{
executorService.submit(() ->
this.s3Bean.copyObject(bucketName, file.getKey(), bucketName, toKey);
this.s3Bean.deleteObject(bucketName, file.getKey());
)};
});
}
This should speed up the process. 这应该加快这个过程。
An alternative might be using AWS-lambda. 另一种方法可能是使用AWS-lambda。 Once the file appear in source bucket you can, for example, put event in the SQS FIFO queue. 一旦文件出现在源存储桶中,您就可以将事件放入SQS FIFO队列中。 The lambda will start single file copy by this event. lambda将通过此事件启动单个文件副本。 If I am not mistaken in parallel you can start up to 500 instances of lambdas. 如果我没有并行错误,你可以启动多达500个lambdas实例。 Should be fast. 应该快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.