簡體   English   中英

Java S3 上傳大文件 (~ 1.5Tb) 出錯並出現 ResetException。 通過 InputStream 讀取/處理文件

[英]Java S3 upload large file (~ 1.5Tb) erroring out with ResetException. File is read/processed via InputStream

我的應用程序在 Java 中運行。我有一個大文件,我加密並上傳到 S3。 由於文件很大,我無法將其保存在內存中,因此使用 PipedInput 和 PipedOutputStreams 進行加密。 我有 BufferedInputStream 包裝 PipedInputStream,然后傳遞給 S3 PutObjectRequest。 我已經計算出加密的 object 的大小並將其添加到 Objectmetadata 中。 下面是一些代碼片段:

PipedInputStream pis = new PipedInputStream(uploadFileInfo.getPout(), MAX_BUFFER_SIZE);
BufferedInputStream bis = new BufferedInputStream(pis, MAX_BUFFER_SIZE);
LOG.info("Is mark supported? " + bis.markSupported());
PutObjectRequest putObjectRequest = new PutObjectRequest(uploadFileInfo.getS3TargetBucket(),
                        uploadFileInfo.getS3TargetObjectKey() + ".encrypted",
                        bis, metadata);
//Set read limit to more than stream size expected i.e 20mb
// https://github.com/aws/aws-sdk-java/issues/427
LOG.info("set read limit to " + (MAX_BUFFER_SIZE + 1));

putObjectRequest.getRequestClientOptions().setReadLimit(MAX_BUFFER_SIZE + 1);
Upload upload = transferManager.upload(putObjectRequest);

我的堆棧跟蹤顯示對 BufferedInputStream 的reset()調用拋出異常

[UPLOADER_TRACKER] ERROR com.xxx.yyy.zzz.handler.TrackProgressHandler - Exception from S3 transfer 
com.amazonaws.ResetException: The request to the service failed with a retryable reason, but resetting the request input stream has failed. See exception.getExtraInfo or debug-level logging for the original failure that caused this retry.;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1423)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1240)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998)
    at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3734)
    at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3719)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:258)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:189)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:121)
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:143)
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:48)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Resetting to invalid mark
    at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:105)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1421)
    ... 22 more
[UPLOADER_TRACKER] ERROR com.xxx.yyy.zzz.handler.TrackProgressHandler - Reset exception caught ==> If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
com.amazonaws.ResetException: The request to the service failed with a retryable reason, but resetting the request input stream has failed. See exception.getExtraInfo or debug-level logging for the original failure that caused this retry.;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1423)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1240)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5052)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4998)
    at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3734)
    at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3719)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:258)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:189)
    at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:121)
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:143)
    at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:48)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Resetting to invalid mark
    at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.InputSubstream.reset(InputSubstream.java:110)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.reset(MD5DigestCalculatingInputStream.java:105)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
    at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:120)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1421)

但是,我將 readLimit 添加到 MAX_BUFFER_SIZE + 1。這是來自 AWS 的可靠性提示。 有人早些時候遇到過這個問題嗎? 旁白:由於我對文件進行加密,因此與 File 或 FileInputStream 相比,我需要使用輸入流。 我也沒有在本地磁盤上寫入的權限。

我認為您誤解了該建議。 引用您提供的鏈接,並強調:

例如,如果stream 的最大預期大小為 100,000 字節,則將讀取限制設置為 100,001 (100,000 + 1) 字節。 標記和復位將始終工作 100,000 字節或更少。 請注意,這可能會導致某些流將該字節數緩沖到 memory 中。

正如我所解釋的那樣,它將客戶端配置為能夠在本地緩沖來自源 stream 的內容,而 stream 本身不支持標記/重置。 這與RequestClientOptions.DEFAULT_STREAM_BUFFER_SIZE的文檔一致:

用於對不可標記和可復位的非文件輸入啟用標記和復位 stream

換句話說,它用於在客戶端緩沖整個源 stream ,而不是指定從源 stream 發送多大的塊。 在您的情況下,我認為它被忽略了,因為(1)您沒有緩沖整個 stream,並且(2)您傳遞的 stream確實自己實現了標記/重置。

多部分上傳,這是TransferManager在您的示例中所做的,將輸入 ZF7B44CFFAFD5C52223D5498196C8A2E7BZ 分成至少 5 MB 的塊(實際塊大小取決於 stream 的聲明大小;對於 1.5 TB 文件,它大約為 158 MiB )。 這些是使用UploadPart API 調用上傳的,該調用嘗試一次發送整個塊。 如果某個部分因可重試原因而失敗,則客戶端會嘗試將 stream 重置為塊的開頭。

您可以通過將BufferedInputStream上的讀取限制設置為足以容納單個部分的大小來完成這項工作。 傳輸管理器使用的計算在這里 它是文件大小除以 10,000(多部分上傳中的最大部分數)。 所以,再一次,158 MiB。 為了安全起見,我會使用 200 MiB(並且因為我確定您有更大的文件)。

但是,如果是我,我可能會直接使用低級的分段上傳方法。 在我看來, TransferManger的主要好處是能夠上傳文件,它可以利用多個線程來執行並發部分上傳。 使用 stream,您必須按順序處理每個零件。

實際上,如果是我,我會認真重新考慮上傳一個 1.5 TB 的文件。 是的,你可以做到。 但我無法想象你每次想要閱讀它時都會下載整個文件。 相反,我希望您正在下載一個字節范圍。 在這種情況下,您可能會發現處理 1500 個大小為 1 GiB 的文件同樣容易。

這似乎是 S3 SDK 和BufferedInputStream的已知問題

參見https://github.com/aws/aws-sdk-java/issues/427#issuecomment-273550783

最簡單的解決方案(即使不理想)是在本地下載文件並將File object 傳遞給 S3 SDK,如下所示

InputStream inputStream = ...;
File tempFile = File.createTempFile("upload-temp", "")
FileUtils.copyInputStreamToFile(inputStream, file);  // or any other copy utility
PutObjectRequest putObjectRequest = new PutObjectRequest(bucket, key, tempFile);
Upload upload = transferManager.upload(putObjectRequest);
tempFile.deleteOnExit();   

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM