簡體   English   中英

如何使用 java SDK 異步將分段上傳到 Amazon S3

[英]How to upload multipart to Amazon S3 asynchronously using the java SDK

在我的 java 應用程序中,我需要將數據寫入 S3,我事先不知道大小,而且大小通常很大,因此在 AWS S3 文檔中建議我使用使用 Java AWS SDK(低級API)將數據寫入 s3 存儲桶。

在我的應用程序中,我提供S3BufferedOutputStream ,它是一個實現OutputStream ,應用程序中的其他類可以使用此 stream 寫入 s3 存儲桶。

我將數據存儲在緩沖區和循環中,一旦數據大於存儲桶大小,我將緩沖區中的數據作為單個UploadPartRequest這是 S3BufferedOutputStream 的 write 方法的實現

@Override
public void write(byte[] b, int off, int len) throws IOException {
    this.assertOpen();
    int o = off, l = len;
    int size;
    while (l > (size = this.buf.length - position)) {
        System.arraycopy(b, o, this.buf, this.position, size);
        this.position += size;
        flushBufferAndRewind();
        o += size;
        l -= size;
    }
    System.arraycopy(b, o, this.buf, this.position, l);
    this.position += l;
}

整個實現類似於: code repo

我這里的問題是每個 UploadPartRequest 都是同步完成的,所以我們必須等待一個部分上傳才能上傳下一部分。 而且因為我使用的是 AWS S3 低級 API 我無法從TransferManager提供的並行上傳中受益

有沒有辦法使用低級 SDK 實現並行上傳? 或者可以進行一些代碼更改以異步操作而不會破壞上傳的數據並保持數據的順序?

您應該考慮將 AWS SDK 用於 Java V2。 您引用的是 V1,而不是最新的 Amazon S3 Java API。 如果您不熟悉 V2,請從這里開始:

開始使用適用於 Java 2.x 的 AWS SDK

要通過 Amazon S3 Java API 執行異步操作,請使用S3AsyncClient

現在要了解如何使用此客戶端上傳 object,請參閱此代碼示例

import software.amazon.awssdk.core.async.AsyncRequestBody;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3AsyncClient;
import software.amazon.awssdk.services.s3.model.PutObjectRequest;
import software.amazon.awssdk.services.s3.model.PutObjectResponse;
import java.nio.file.Paths;
import java.util.concurrent.CompletableFuture;
// snippet-end:[s3.java2.async_ops.import]
// snippet-start:[s3.java2.async_ops.main]

/**
 * To run this AWS code example, ensure that you have setup your development environment, including your AWS credentials.
 *
 * For information, see this documentation topic:
 *
 * https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
 */

public class S3AsyncOps {

     public static void main(String[] args) {

         final String USAGE = "\n" +
                 "Usage:\n" +
                 "    S3AsyncOps <bucketName> <key> <path>\n\n" +
                 "Where:\n" +
                 "    bucketName - the name of the Amazon S3 bucket (for example, bucket1). \n\n" +
                 "    key - the name of the object (for example, book.pdf). \n" +
                 "    path - the local path to the file (for example, C:/AWS/book.pdf). \n" ;

        if (args.length != 3) {
            System.out.println(USAGE);
             System.exit(1);
        }

        String bucketName = args[0];
        String key = args[1];
        String path = args[2];

        Region region = Region.US_WEST_2;
        S3AsyncClient client = S3AsyncClient.builder()
                .region(region)
                .build();

        PutObjectRequest objectRequest = PutObjectRequest.builder()
                .bucket(bucketName)
                .key(key)
                .build();

        // Put the object into the bucket
        CompletableFuture<PutObjectResponse> future = client.putObject(objectRequest,
                AsyncRequestBody.fromFile(Paths.get(path))
        );
        future.whenComplete((resp, err) -> {
            try {
                if (resp != null) {
                    System.out.println("Object uploaded. Details: " + resp);
                } else {
                    // Handle error
                    err.printStackTrace();
                }
            } finally {
                // Only close the client when you are completely done with it
                client.close();
            }
        });

        future.join();
    }
}

即使用S3AsyncClient客戶端上傳 object。 要執行分段上傳,您需要使用此方法:

https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3AsyncClient.html#createMultipartUpload-software.amazon.awssdk.services.s3.Z20F35E630DAF44DBFA4C3F68UpRequest-Request-D9D8CZZ99

要查看使用 S3 Sync 客戶端的分段上傳示例,請參閱:

https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/javav2/example_code/s3/src/main/java/com/example/s3/S3ObjectOperations.java

那是您的解決方案 - 使用S3AsyncClient對象的createMultipartUpload方法。

這是我擁有的 class 的一些示例代碼。 它將部件提交給ExecutorService並保留返回的Future 這是為 v1 Java SDK 編寫的; 如果您使用的是 v2 SDK,您可以使用異步客戶端而不是顯式線程池:

public synchronized void uploadPart(byte[] data, boolean isLastPart)
{
    partNumber++;
    logger.debug("submitting part {} for s3://{}/{}", partNumber, bucket, key);

    final UploadPartRequest request = new UploadPartRequest()
                                      .withBucketName(bucket)
                                      .withKey(key)
                                      .withUploadId(uploadId)
                                      .withPartNumber(partNumber)
                                      .withPartSize(data.length)
                                      .withInputStream(new ByteArrayInputStream(data))
                                      .withLastPart(isLastPart);

    futures.add(
        executor.submit(new Callable<PartETag>()
        {
            @Override
            public PartETag call() throws Exception
            {
                int localPartNumber = request.getPartNumber();
                logger.debug("uploading part {} for s3://{}/{}", localPartNumber, bucket, key);
                UploadPartResult response = client.uploadPart(request);
                String etag = response.getETag();
                logger.debug("uploaded part {} for s3://{}/{}; etag is {}", localPartNumber, bucket, key, etag);
                return new PartETag(localPartNumber, etag);
            }
        }));
}

注意:此方法是synchronized的,以確保零件不會亂序提交。

提交所有部分后,您可以使用此方法等待它們完成,然后完成上傳:

public void complete()
{
    logger.debug("waiting for upload tasks of s3://{}/{}", bucket, key);
    List<PartETag> partTags = new ArrayList<>();
    for (Future<PartETag> future : futures)
    {
        try
        {
            partTags.add(future.get());
        }
        catch (Exception e)
        {
            throw new RuntimeException(String.format("failed to complete upload task for s3://%s/%s"), e);
        }
    }

    logger.debug("completing multi-part upload for s3://{}/{}", bucket, key);
    CompleteMultipartUploadRequest request = new CompleteMultipartUploadRequest()
                                              .withBucketName(bucket)
                                              .withKey(key)
                                              .withUploadId(uploadId)
                                              .withPartETags(partTags);
    client.completeMultipartUpload(request);
    logger.debug("completed multi-part upload for s3://{}/{}", bucket, key);
}

您還需要一個abort()方法來取消未完成的部分並中止上傳。 這和 class 的 rest 留給讀者作為練習。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM