简体   繁体   English

使用 AWS SDK 将 InputStream 异步(非阻塞)上传到 AWS s3,用于 Java,版本 2

[英]Upload a InputStream to AWS s3 asynchronously (non-blocking) using AWS SDK for Java, version 2

When I am uploading inputStream object to s3 synchronously (blocking way) it works.当我将inputStream object 同步(阻塞方式)上传到 s3 时,它可以工作。

S3Client s3Client = S3Client.builder().build();
s3Client.putObject(objectRequest, RequestBody.fromInputStream(inputStream,STREAM_SIZE));

but when I try the same with S3AsyncClient there is no .fromInputStream method on AsyncRequestBody .但是当我对AsyncRequestBody进行相同尝试时, S3AsyncClient上没有.fromInputStream方法。

S3AsyncClient s3AsyncClient = S3AsyncClient.builder().build();
s3AsyncClient.putObject(objectRequest, AsyncRequestBody.fromInputStream(inputStream,STREAM_SIZE)); // error no method named 'fromInputStream'

And I can't use .fromByteBuffer as it will load the entire stream into memory, which I don't want.而且我不能使用.fromByteBuffer ,因为它会将整个 stream 加载到 memory 中,这是我不想要的。

I am interested why there is no method to read from InputStream in AsyncRequestBody .我很感兴趣,为什么没有从AsyncRequestBody中的 InputStream 读取的方法。 And Is there any alternatives?还有其他选择吗?

For anyone using Kotlin and coroutines: here is a kotlin wrapper which will create an asynchronous AsyncRequestBody from an InputStream .对于使用 Kotlin 和协程的任何人:这是一个 kotlin 包装器,它将从InputStream创建一个异步AsyncRequestBody The wrapper will run in a background thread by default, but you can pass an explicit CoroutineScope and run it inside of your coroutine, which will avoid creating a separate thread.默认情况下,包装器将在后台线程中运行,但您可以传递显式CoroutineScope并在协程内部运行它,这将避免创建单独的线程。

import io.ktor.util.cio.*
import kotlinx.coroutines.CoroutineScope
import kotlinx.coroutines.DelicateCoroutinesApi
import kotlinx.coroutines.GlobalScope
import kotlinx.coroutines.launch
import org.reactivestreams.Subscriber
import org.reactivestreams.Subscription
import software.amazon.awssdk.core.async.AsyncRequestBody
import java.io.InputStream
import java.nio.ByteBuffer
import java.util.*

@OptIn(DelicateCoroutinesApi::class)
class StreamAsyncRequestBody(
  inputStream: InputStream,
  private val coroutineScope: CoroutineScope = GlobalScope
) :
  AsyncRequestBody {
  private val inputChannel =
    inputStream.toByteReadChannel(context = coroutineScope.coroutineContext)

  override fun subscribe(subscriber: Subscriber<in ByteBuffer>) {
    subscriber.onSubscribe(object : Subscription {
      private var done: Boolean = false

      override fun request(n: Long) {
        if (!done) {
          if (inputChannel.isClosedForRead) {
            complete()
          } else {
            coroutineScope.launch {
              inputChannel.read {
                subscriber.onNext(it)
                if (inputChannel.isClosedForRead) {
                  complete()
                }
              }
            }
          }
        }
      }

      private fun complete() {
        subscriber.onComplete()
        synchronized(this) {
          done = true
        }
      }

      override fun cancel() {
        synchronized(this) {
          done = true
        }
      }
    })
  }

  override fun contentLength(): Optional<Long> = Optional.empty()
}

Example usage:示例用法:

suspend fun s3Put(objectRequest: PutObjectRequest, inputStream: InputStream) = coroutineContext {
  s3Client.putObject(objectRequest, StreamAsyncRequestBody(inputStream, this)
}

If you use Java, you will need to create your own wrapper and use a different coroutine library.如果您使用 Java,您将需要创建自己的包装器并使用不同的协程库。 Alternatively, you could create an Executor with a fixed number of threads: if you have too many uploads running at once, they will block one another, but they won't create too many threads and block the entire program.或者,您可以创建一个具有固定数量线程的Executor :如果您一次运行的上传太多,它们会相互阻塞,但它们不会创建太多线程并阻塞整个程序。


EDIT: Fixed the code.编辑:修复了代码。 I didn't test the previous version, I tested this version a few times to upload and it worked.我没有测试之前的版本,我测试了这个版本几次上传,它可以工作。 Of course that doesn't mean it's bug-free though:)当然,这并不意味着它没有错误:)

After some research this is what I found:经过一番研究,这是我发现的:

  1. InputStream is blocking on it's nature, So when you read from input stream some thread will be block, in case of @jakobeha's answer ' toByteReadChannel ' will return a read blocking Channel. InputStream 本质上是阻塞的,因此当您从输入 stream 读取时,某些线程将被阻塞,如果@jakobeha 的回答“ toByteReadChannel ”将返回读取阻塞通道。 so Considering Performance it is somewhat equivalent to Performing Sync S3Client.fromInputStream() in background Thread, you can do that by wrapping it in CompletableFuture.所以考虑到性能,它有点相当于在后台线程中执行同步 S3Client.fromInputStream(),你可以通过将它包装在 CompletableFuture 中来做到这一点。
  2. Other "AsyncRequestBody" types like "FileAsyncRequestBody" uses 'nio' (non blocking I/O) with callbacks.其他“AsyncRequestBody”类型如“FileAsyncRequestBody”使用带有回调的“nio”(非阻塞 I/O)。 Maybe that's why AWS team haven't included "fromInputStream" in "AsyncRequestBody" as it is simply not possible to use fully non-blocking way, and it would have caused confusion.也许这就是为什么 AWS 团队没有在“AsyncRequestBody”中包含“fromInputStream”的原因,因为它根本不可能使用完全非阻塞的方式,并且会引起混乱。
  3. If you want a highly scalable solution, the best solution would be not to use InputStream all together, find where the InputStream is originated and use some alternative which support non blocking Channels, In my case I have used Java Flow and converted it to 'Publisher' and used AsyncRequestBody.fromPublisher()如果您想要一个高度可扩展的解决方案,最好的解决方案是不要一起使用 InputStream,找到 InputStream 的来源并使用一些支持非阻塞通道的替代方案,在我的情况下,我使用了 Java Flow 并将其转换为'Publisher ' 并使用 AsyncRequestBody.fromPublisher()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Download file from URL and upload it to AWS S3 without saving into memory using AWS SDK for Java, version 2 - Download file from URL and upload it to AWS S3 without saving into memory using AWS SDK for Java, version 2 使用AWS开发工具包将文件上传到S3 - Upload a file to S3 using the AWS SDK 使用AWS开发工具包Java上传S3 - Uploading S3 using AWS SDK Java 如何使用 java SDK 异步将分段上传到 Amazon S3 - How to upload multipart to Amazon S3 asynchronously using the java SDK 使用AWS Java Sdk进行AWS Lambda和S3集成 - AWS Lambda and S3 integration using AWS Java Sdk 无法使用Java sdk在AWS S3上使用预签名的URL上传 - unable to upload with presigned url on aws s3 with java sdk 将包含空子文件夹的文件夹上传到S3(使用AWS Java SDK) - Upload folder including empty subfolders to S3 (with AWS Java SDK) 使用 AWS Java S3 SDK TransferManager 从 SFTP 流恢复上传 - Using AWS Java S3 SDK TransferManager to resume an upload from a SFTP stream 使用AWS Java SDK将文档从Amazon s3上载到CloudSearch - Upload documents from Amazon s3 to CloudSearch using AWS Java SDK 如何使用 aws-java-sdk 2.0 删除非空 S3 存储桶 - How to delete non empty S3 bucket using aws-java-sdk 2.0
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM