[英]Akka Streams run flow asynchronously
I have tested simple async flow if it runs asynchronously and I'm suprised it's not.如果它异步运行,我已经测试了简单的异步流,但我很惊讶它不是。 Do I need some additional configuration?
我需要一些额外的配置吗?
@Configuration
class StreamingConfiguration
{
@Bean
Materializer materializer(ActorSystem actorSystem)
{
return ActorMaterializer.create(actorSystem);
}
@PostConstruct
public void test(Materializer materializer)
{
var takePart = Flow.of(String.class).map(path -> {
var start = System.currentTimeMillis();
while (System.currentTimeMillis() - start < 3000) {}
return path;
});
Source.from(Lists.newArrayList("A", "B", "C", "D"))
.via(takePart.async())
.toMat(Sink.fold("", (arg1, arg2) -> arg1), Keep.right())
.run(materializer)
.toCompletableFuture()
.join();
}
}
I can see materializer has default fork-join-pool dispatcher我可以看到物化器具有默认的 fork-join-pool 调度程序
EDIT: sorry but your example also doesn't work.编辑:抱歉,您的示例也不起作用。 It takes still 12~ seconds to finish while using
mapAsync
.使用
mapAsync
还需要 12~ 秒才能完成。 I tried flatMapMerge
with the same result : /我尝试了
flatMapMerge
,结果相同:/
Function<String, CompletionStage<String>> blocking = s -> {
try
{
Thread.sleep(3000);
} catch (InterruptedException e)
{
e.printStackTrace();
}
return CompletableFuture.completedFuture(s);
};
Source.from(List.of("A", "B", "C", "D"))
.mapAsync(4, blocking)
.toMat(Sink.fold("", (arg1, arg2) -> arg1), Keep.right())
.run(actorSystem)
.toCompletableFuture()
.join();
Akka Streams by default materializes stream stages into a single actor: this avoids the overhead of passing messages between stream stages but it does mean that the second element of the stream won't be consumed until the first element has worked its way through the stream. Akka Streams 默认将流阶段物化为单个actor:这避免了在流阶段之间传递消息的开销,但这确实意味着在第一个元素通过流之前不会消耗流的第二个元素。
The async
operator in a stream means that the stream up to that will be executed in its own actor.流中的
async
操作符意味着直到它的流将在它自己的 actor 中执行。 In your example code:在您的示例代码中:
Source
will be an actor Source
将是一名演员takePart
flow will be an actor takePart
流程将是一个演员Sink
will be an actor Sink
将成为一名演员Each of these will still not allow more than one element to be in process at a time: the gain over not having async
is that the Source
and Sink
can have an element in process at the same time as takePart
has an element in process.这些中的每一个仍然不允许一次处理多个元素:没有
async
的好处是Source
和Sink
可以同时处理一个元素,而takePart
有一个处理中的元素。 There's also a small implicit buffer in downstream stages to improve throughput, but that can often be ignored.在下游阶段还有一个小的隐式缓冲区来提高吞吐量,但这通常可以被忽略。
In this stream, the takePart
stage takes 3 seconds to process an element and the Source
and Sink
take a few microseconds (for the sake of illustration, we'll say that the Source
takes 5 microseconds and the Sink
takes 15 microseconds).在这个流中,
takePart
阶段需要 3 秒来处理一个元素,而Source
和Sink
需要几微秒(为了说明起见,我们会说Source
需要 5 微秒, Sink
需要 15 微秒)。 So the rough chronology is (ignoring the buffer):所以粗略的年表是(忽略缓冲区):
time 0: takePart
signals demand to Source
time 5 us: Source
emits A to takePart
time 3 seconds + 5 us: takePart
emits A to Sink
, signals demand to Source
time 3 seconds + 10 us: Source
emits B to takePart
time 3 seconds + 20 us: Sink
processes A, signals demand to takePart
time 6 seconds + 10 us: takePart
emits B to Sink
, signals demand to Source
time 6 seconds + 15 us: Source
emits C to takePart
time 6 seconds + 25 us: Sink
processes B, signals demand to takePart
time 9 seconds + 15 us: takePart
emits C to Sink
, signals demand to Source
time 9 seconds + 20 us: Source
emits D to takePart
time 9 seconds + 30 us: Sink
processes C, signals demand to takePart
time 12 seconds + 20 us: takePart
emits D to Sink
, signals demand to Source
, Source
completes, takePart
completes time 12 seconds + 35 us: Sink
processes D, completes时间0:
takePart
信号需求Source
时间5我们: Source
发射到takePart
时间为3秒+ 5我们: takePart
发射到Sink
,信号需求Source
时间为3秒+ 10我们: Source
发射B至takePart
时间为3秒+ 20 us: Sink
处理 A,信号要求采取takePart
时间 6 秒 + 10 us: takePart
B 发送到Sink
,信号要求到Source
时间 6 秒 + 15 us: Source
发出 C 以takePart
时间 6 秒 + 25 us: Sink
处理 B , 信号要求采取takePart
时间 9 秒 + 15 us: takePart
向Sink
发出 C, 信号要求到Source
时间 9 秒 + 20 us: Source
发出 D 以takePart
时间 9 秒 + 30 us: Sink
处理 C, 信号要求采取takePart
时间12 秒 + 20 us: takePart
向Sink
发出 D,向Source
发出需求信号, Source
完成, takePart
完成时间 12 秒 + 35 us: Sink
处理 D,完成
Absent the async
, the stream would complete in 4 * (3 sec + 20 us)
, so the async
saved 45 us (cumulatively, async
in this stream would save 15 us for every element after the first), so not much of a gain.如果没有
async
,该流将在4 * (3 sec + 20 us)
,因此async
节省了 45 us(累计,此流中的async
将为第一个之后的每个元素节省 15 us),因此收益不大. A pipelined stream at full utilization has throughput gated by the slowest section (you can imagine a highway where the speed limit drops: if the traffic is heavy enough to saturate the highway, the speed on the highway before the speed limit drop will be the speed limit after the drop): you get the best results if each side of the async
processes elements at about the same rate.充分利用的管道流由最慢的路段门控吞吐量(您可以想象一条速度限制下降的高速公路:如果交通量足以使高速公路饱和,则速度限制下降之前高速公路上的速度将是速度下降后限制):如果
async
每一侧以大致相同的速率处理元素,您将获得最佳结果。
There is, somewhat confusingly, another usage of "async" in the Akka Streams API, used to denote stages which communicate with asynchronous processes by obtaining Future
s (Scala) or CompletionStage
s (Java): the process completing the Future
/ CompletionStage
may run on a different thread, and the stream stage often includes some limit on the number of Future
s/ CompletionStage
s it will allow to be in flight at a time.有点令人困惑的是,Akka Streams API 中还有“异步”的另一种用法,用于表示通过获取
Future
s (Scala) 或CompletionStage
s (Java) 与异步进程通信的阶段:完成Future
/ CompletionStage
的进程可能会运行在不同的线程上,并且流阶段通常包括对Future
s / CompletionStage
s 的数量的一些限制,它允许一次飞行。 mapAsync
is an example of this. mapAsync
就是一个例子。
In Scala (I am generally unfamiliar with the Java future APIs), this would be something like (ignoring setting an implicit ExecutionContext
, etc.):在 Scala 中(我通常不熟悉 Java 未来的 API),这将类似于(忽略设置隐式
ExecutionContext
等):
def blockOnElement(e: String): Future[String] = Future {
Thread.sleep(3000)
e
}
Source(List("A", "B", "C", "D"))
.mapAsync(4)(blockOnElement)
.runWith(Sink.fold("") { (acc, _) => acc })
In that, assuming sufficient (more than 4) threads in the dispatcher, the entire stream should finish (assuming the 5/15 us delays above) in about 3 seconds and 80 us (the Source
and Sink
will still combine to spend 20 us on every element.其中,假设调度程序中有足够的(超过 4 个)线程,整个流应该在大约 3 秒和 80 us 内完成(假设上面的 5/15 us 延迟)(
Source
和Sink
仍将结合起来花费 20 us每一个元素。
In addition to @Alec's mention of flatMapMerge
, it's often useful to run a substream in mapAsync
by using Source.single
and Sink.head
: the materialized value of the sink will be a Future
/ CompletionStage
of the output element and the mapAsync
will in turn preserve ordering downstream (in contrast to flatMapMerge
).除了
flatMapMerge
提到的flatMapMerge
,通过使用Source.single
和Sink.head
在mapAsync
运行子流通常很有用:接收器的物化值将是输出元素的Future
/ CompletionStage
,而mapAsync
将依次保留下游排序(与flatMapMerge
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.