简体   繁体   English

与Akka相比,Akka流中的并行性

[英]Parallelism in Akka streams in comparison with Akka

I have been trying to explore more about akka streams, but I am failing to understand on how we can achieve similar parallelism in the way we achieve using Akka.Lets say Actor A consumes data from kafka and writes it to s3 and another Actor B consumes from kafka and writes it to postgres and another Actor C reads from DB and produces it another kafka topic. 我一直在尝试探索更多关于akka流的信息,但是我未能理解我们如何以使用Akka的方式实现类似的并行性,比如说Actor A消耗了kafka的数据并将其写入s3,而另一个Actor B消耗了从kafka中将其写入postgres,另一个Actor C从DB中读取并产生另一个kafka主题。 All 3 actors can be in different actor systems and need not be dependent on other. 所有3个参与者可以处于不同的参与者系统中,而不必依赖于其他参与者。 But how do I achieve a similar thing using Akka streams. 但是,如何使用Akka流实现类似的目标。 I believe akka streams have phases where A does something and pipes it to B and so on till we reach the sink. 我相信akka流在阶段中A会做一些事情并将其通过管道传送到B,依此类推,直到到达水槽为止。 I do realise there is a mapAsync which can be used to paralellise things but I am not sure how it would play in this context and also in terms of ordering gaurantees. 我确实意识到有一个mapAsync可以用来并行化事物,但是我不确定它在这种情况下以及在订购gaurantees方面将如何发挥作用。

Single Source 单一来源

For the particular use case that you've listed you can use BroadcastHub to "fan out" each data item from kafka to each of the Sink values you listed: 对于您列出的特定用例,可以使用BroadcastHub将“数据”从kafkakafka到列出的每个Sink值:

type Data = ???

val kafkaSource : Source[Data, _] = ???

val runnableGraph: RunnableGraph[Source[Data, NotUsed]] =
  kafkaSource.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.right)

val kafkaHub : Source[Data, NotUsed] = runnableGraph.run()

val s3Sink : Sink[Data, _] = ???

val postgresSink : Sink[Data, _] = ???

kafkaHub.to(s3Sink).run()
kafkaHub.to(postgresSink).run()

Multiple Sources 多种来源

One important drawback of the above implementation is that "the rate of the producer will be automatically adapted to the slowest consumer". 上述实施方式的一个重要缺点是“生产者的费率将自动适应最慢的消费者”。

Therefore, if you're able to make multiple connections to the ultimate source then that will likely be more performant by maximizing concurrency: 因此,如果您能够与最终源建立多个连接,则可以通过最大化并发性来提高性能:

val kafkaSource : () => Source[Data,_] = ???

//stream 1
kafkaSource().to(s3Sink).run()

//stream 2
kafkaSource().to(postgresSink).run()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM