[英]Multiple Kafka Partitions to Akka Streams
Hi I am working with Kafka
and Akka Streams
.嗨,我正在与
Kafka
和Akka Streams
。 In Kafka
for a topic MyTestTopic
I have 3 partitions and data is being pushed into the topic at high concurrency roughly 1000 QPS
and it'll only go higher than that.在
Kafka
的主题MyTestTopic
我有 3 个分区,数据以大约1000 QPS
高并发率推送到主题中,并且只会高于此值。
Below is my code for Akka Stream Kafka Consumer:下面是我的 Akka Stream Kafka Consumer 代码:
final ConsumerSettings<String, byte[]> consumerSettings =
ConsumerSettings.create(kafkaConfig, new StringDeserializer(), new ByteArrayDeserializer())
.withBootstrapServers("127.0.0.1:9092")
.withGroupId("TestConsumerGroup")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
.withProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false")
.withProperty(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, String.valueOf(timeout));
ActorMaterializer materializer = ActorMaterializer.create(system);
RestartSource.onFailuresWithBackoff(
java.time.Duration.ofSeconds(3),
java.time.Duration.ofSeconds(3000),
0.2,
() -> Consumer.atMostOnceSource(consumerSettings, Subscriptions.topics("MyTestTopic"))
.mapAsyncUnordered(10,
record -> ask(rootHandler, new StreamData(record), Duration.ofSeconds(timeout))))
.to(Sink.foreach(App::sinkParser))
.run(materializer);
My Question:我的问题:
Akka Stream consumers
to listen to different Kafka partitions
since multiple partitions leading to a single instance of Akka Steam seems like a bottle-neck
.Akka Stream consumers
来侦听不同的Kafka partitions
因为导致单个 Akka Steam 实例的多个分区似乎是一个bottle-neck
。Akka Clustering
the answer to this? Akka Clustering
这个问题的答案吗? Keeping 2 seed nodes
on static servers and multiple akka stream consumers
on auto scale in a cloud based environment.2 seed nodes
和multiple akka stream consumers
。 I can't seem to figure it out, I need help thanks我似乎无法弄清楚,我需要帮助,谢谢
There are a couple of ways to approach this, depending on particulars you haven't elaborated on:有几种方法可以解决这个问题,具体取决于您尚未详细说明的细节:
If you're reasonably sure that one node can handle processing all the messages, you can set up multiple streams up to 1 stream per partition.如果您有理由确信一个节点可以处理所有消息,您可以设置多个流,每个分区最多 1 个流。
An evolution of this would be to use a CommittablePartitionedSource so that you dynamically create as many streams as there are partitions.一种演变是使用CommittablePartitionedSource以便您动态创建与分区一样多的流。 Note that you'll need to manually commit offsets (eg using
Committer.sink
).请注意,您需要手动提交偏移量(例如使用
Committer.sink
)。
You can have one stream per instance and deploy up to as many instances as you have partitions;每个实例可以有一个流,并且可以部署与分区一样多的实例; with the same consumer group, the instances will coordinate among themselves the partition assignments.
对于相同的消费者组,实例将在它们之间协调分区分配。 When deploying multiple instances, you may or may not need Akka Cluster, depending on the nature of what the actor you're
ask
ing is doing.在部署多个实例时,您可能需要也可能不需要 Akka 集群,这取决于您
ask
参与者正在做什么的性质。
If no state is being maintained in the actor per message (note that this would encompass the actor doing a read-modify-write on an external datastore: if you can ensure that the messages affecting a given row are in the same Kafka partition, you might even be able to do without ACID in that external datastore) you likely don't need Akka clustering.如果actor中没有为每条消息维护状态(请注意,这将包括actor在外部数据存储上执行读取-修改-写入:如果您可以确保影响给定行的消息位于同一个Kafka分区中,您甚至可以在该外部数据存储中没有 ACID 的情况下进行),您可能不需要 Akka 集群。
If the actors are themselves stateful (eg they're shadowing some IoT device), then you almost certainly want the combination of Akka Cluster, Akka Cluster Sharding, and Akka Persistence.如果参与者本身是有状态的(例如,他们正在隐藏一些物联网设备),那么您几乎肯定需要 Akka Cluster、Akka Cluster Sharding 和 Akka Persistence 的组合。 Going all the way with this does have some advantages over actors doing read-modify-write on an external datastore (eg most of the reads can be eliminated in favor of tracking state in the actor and embracing event sourcing).
与在外部数据存储上执行读取-修改-写入的 Actor 相比,一直采用这种方式确实具有一些优势(例如,可以消除大部分读取以支持跟踪 Actor 中的状态并采用事件源)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.