简体   繁体   English

Akka Streams 异步运行流

[英]Akka Streams run flow asynchronously

I have tested simple async flow if it runs asynchronously and I'm suprised it's not.如果它异步运行,我已经测试了简单的异步流,但我很惊讶它不是。 Do I need some additional configuration?我需要一些额外的配置吗?

@Configuration
class StreamingConfiguration
{
 
    @Bean
    Materializer materializer(ActorSystem actorSystem)
    {
        return ActorMaterializer.create(actorSystem);
    }

    @PostConstruct
    public void test(Materializer materializer)
    {
        var takePart = Flow.of(String.class).map(path -> {
            var start = System.currentTimeMillis();
            while (System.currentTimeMillis() - start < 3000) {}
            return path;
        });

        Source.from(Lists.newArrayList("A", "B", "C", "D"))
            .via(takePart.async())
            .toMat(Sink.fold("", (arg1, arg2) -> arg1), Keep.right())
            .run(materializer)
            .toCompletableFuture()
            .join();
    }
}

I can see materializer has default fork-join-pool dispatcher我可以看到物化器具有默认的 fork-join-pool 调度程序

EDIT: sorry but your example also doesn't work.编辑:抱歉,您的示例也不起作用。 It takes still 12~ seconds to finish while using mapAsync .使用mapAsync还需要 12~ 秒才能完成。 I tried flatMapMerge with the same result : /我尝试了flatMapMerge ,结果相同:/

   Function<String, CompletionStage<String>> blocking = s -> {
            try
            {
                Thread.sleep(3000);

            } catch (InterruptedException e)
            {
                e.printStackTrace();
            }
            return CompletableFuture.completedFuture(s);
        };


        Source.from(List.of("A", "B", "C", "D"))
                .mapAsync(4, blocking)
                .toMat(Sink.fold("", (arg1, arg2) -> arg1), Keep.right())
                .run(actorSystem)
                .toCompletableFuture()
                .join();

Akka Streams by default materializes stream stages into a single actor: this avoids the overhead of passing messages between stream stages but it does mean that the second element of the stream won't be consumed until the first element has worked its way through the stream. Akka Streams 默认将流阶段物化为单个actor:这避免了在流阶段之间传递消息的开销,但这确实意味着在第一个元素通过流之前不会消耗流的第二个元素。

The async operator in a stream means that the stream up to that will be executed in its own actor.流中的async操作符意味着直到它的流将在它自己的 actor 中执行。 In your example code:在您的示例代码中:

  • The Source will be an actor Source将是一名演员
  • The takePart flow will be an actor takePart流程将是一个演员
  • The Sink will be an actor Sink将成为一名演员

Each of these will still not allow more than one element to be in process at a time: the gain over not having async is that the Source and Sink can have an element in process at the same time as takePart has an element in process.这些中的每一个仍然不允许一次处理多个元素:没有async的好处是SourceSink可以同时处理一个元素,而takePart有一个处理中的元素。 There's also a small implicit buffer in downstream stages to improve throughput, but that can often be ignored.在下游阶段还有一个小的隐式缓冲区来提高吞吐量,但这通常可以被忽略。

In this stream, the takePart stage takes 3 seconds to process an element and the Source and Sink take a few microseconds (for the sake of illustration, we'll say that the Source takes 5 microseconds and the Sink takes 15 microseconds).在这个流中, takePart阶段需要 3 秒来处理一个元素,而SourceSink需要几微秒(为了说明起见,我们会说Source需要 5 微秒, Sink需要 15 微秒)。 So the rough chronology is (ignoring the buffer):所以粗略的年表是(忽略缓冲区):

time 0: takePart signals demand to Source time 5 us: Source emits A to takePart time 3 seconds + 5 us: takePart emits A to Sink , signals demand to Source time 3 seconds + 10 us: Source emits B to takePart time 3 seconds + 20 us: Sink processes A, signals demand to takePart time 6 seconds + 10 us: takePart emits B to Sink , signals demand to Source time 6 seconds + 15 us: Source emits C to takePart time 6 seconds + 25 us: Sink processes B, signals demand to takePart time 9 seconds + 15 us: takePart emits C to Sink , signals demand to Source time 9 seconds + 20 us: Source emits D to takePart time 9 seconds + 30 us: Sink processes C, signals demand to takePart time 12 seconds + 20 us: takePart emits D to Sink , signals demand to Source , Source completes, takePart completes time 12 seconds + 35 us: Sink processes D, completes时间0: takePart信号需求Source时间5我们: Source发射到takePart时间为3秒+ 5我们: takePart发射到Sink ,信号需求Source时间为3秒+ 10我们: Source发射B至takePart时间为3秒+ 20 us: Sink处理 A,信号要求采取takePart时间 6 秒 + 10 us: takePart B 发送到Sink ,信号要求到Source时间 6 秒 + 15 us: Source发出 C 以takePart时间 6 秒 + 25 us: Sink处理 B , 信号要求采取takePart时间 9 秒 + 15 us: takePartSink发出 C, 信号要求到Source时间 9 秒 + 20 us: Source发出 D 以takePart时间 9 秒 + 30 us: Sink处理 C, 信号要求采取takePart时间12 秒 + 20 us: takePartSink发出 D,向Source发出需求信号, Source完成, takePart完成时间 12 秒 + 35 us: Sink处理 D,完成

Absent the async , the stream would complete in 4 * (3 sec + 20 us) , so the async saved 45 us (cumulatively, async in this stream would save 15 us for every element after the first), so not much of a gain.如果没有async ,该流将在4 * (3 sec + 20 us) ,因此async节省了 45 us(累计,此流中的async将为第一个之后的每个元素节省 15 us),因此收益不大. A pipelined stream at full utilization has throughput gated by the slowest section (you can imagine a highway where the speed limit drops: if the traffic is heavy enough to saturate the highway, the speed on the highway before the speed limit drop will be the speed limit after the drop): you get the best results if each side of the async processes elements at about the same rate.充分利用的管道流由最慢的路段门控吞吐量(您可以想象一条速度限制下降的高速公路:如果交通量足以使高速公路饱和,则速度限制下降之前高速公路上的速度将是速度下降后限制):如果async每一侧以大致相同的速率处理元素,您将获得最佳结果。

There is, somewhat confusingly, another usage of "async" in the Akka Streams API, used to denote stages which communicate with asynchronous processes by obtaining Future s (Scala) or CompletionStage s (Java): the process completing the Future / CompletionStage may run on a different thread, and the stream stage often includes some limit on the number of Future s/ CompletionStage s it will allow to be in flight at a time.有点令人困惑的是,Akka Streams API 中还有“异步”的另一种用法,用于表示通过获取Future s (Scala) 或CompletionStage s (Java) 与异步进程通信的阶段:完成Future / CompletionStage的进程可能会运行在不同的线程上,并且流阶段通常包括对Future s / CompletionStage s 的数量的一些限制,它允许一次飞行。 mapAsync is an example of this. mapAsync就是一个例子。

In Scala (I am generally unfamiliar with the Java future APIs), this would be something like (ignoring setting an implicit ExecutionContext , etc.):在 Scala 中(我通常不熟悉 Java 未来的 API),这将类似于(忽略设置隐式ExecutionContext等):

def blockOnElement(e: String): Future[String] = Future {
  Thread.sleep(3000)
  e
}

Source(List("A", "B", "C", "D"))
  .mapAsync(4)(blockOnElement)
  .runWith(Sink.fold("") { (acc, _) => acc })

In that, assuming sufficient (more than 4) threads in the dispatcher, the entire stream should finish (assuming the 5/15 us delays above) in about 3 seconds and 80 us (the Source and Sink will still combine to spend 20 us on every element.其中,假设调度程序中有足够的(超过 4 个)线程,整个流应该在大约 3 秒和 80 us 内完成(假设上面的 5/15 us 延迟)( SourceSink仍将结合起来花费 20 us每一个元素。

In addition to @Alec's mention of flatMapMerge , it's often useful to run a substream in mapAsync by using Source.single and Sink.head : the materialized value of the sink will be a Future / CompletionStage of the output element and the mapAsync will in turn preserve ordering downstream (in contrast to flatMapMerge ).除了flatMapMerge提到的flatMapMerge ,通过使用Source.singleSink.headmapAsync运行子流通常很有用:接收器的物化值将是输出元素的Future / CompletionStage ,而mapAsync将依次保留下游排序(与flatMapMerge )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM