多个 Kafka 分区到 Akka Streams

Question

Hi I am working with Kafka and Akka Streams .嗨，我正在与Kafka和Akka Streams 。 In Kafka for a topic MyTestTopic I have 3 partitions and data is being pushed into the topic at high concurrency roughly 1000 QPS and it'll only go higher than that.在Kafka的主题MyTestTopic我有 3 个分区，数据以大约1000 QPS高并发率推送到主题中，并且只会高于此值。

Below is my code for Akka Stream Kafka Consumer:下面是我的 Akka Stream Kafka Consumer 代码：

final ConsumerSettings<String, byte[]> consumerSettings =
        ConsumerSettings.create(kafkaConfig, new StringDeserializer(), new ByteArrayDeserializer())
                .withBootstrapServers("127.0.0.1:9092")
                .withGroupId("TestConsumerGroup")
                .withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
                .withProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false")
                .withProperty(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, String.valueOf(timeout));

ActorMaterializer materializer = ActorMaterializer.create(system);

RestartSource.onFailuresWithBackoff(
        java.time.Duration.ofSeconds(3),
        java.time.Duration.ofSeconds(3000),
        0.2,
        () -> Consumer.atMostOnceSource(consumerSettings, Subscriptions.topics("MyTestTopic"))
                .mapAsyncUnordered(10,
                        record -> ask(rootHandler, new StreamData(record), Duration.ofSeconds(timeout))))
        .to(Sink.foreach(App::sinkParser))
        .run(materializer);

My Question:我的问题：

How can I define multiple Akka Stream consumers to listen to different Kafka partitions since multiple partitions leading to a single instance of Akka Steam seems like a bottle-neck .我如何定义多个Akka Stream consumers来侦听不同的Kafka partitions因为导致单个 Akka Steam 实例的多个分区似乎是一个bottle-neck 。
Is Akka Clustering the answer to this? Akka Clustering这个问题的答案吗？ Keeping 2 seed nodes on static servers and multiple akka stream consumers on auto scale in a cloud based environment.在基于云的环境中自动扩展静态服务器上的2 seed nodes和multiple akka stream consumers 。

I can't seem to figure it out, I need help thanks我似乎无法弄清楚，我需要帮助，谢谢

Answer 1

There are a couple of ways to approach this, depending on particulars you haven't elaborated on:有几种方法可以解决这个问题，具体取决于您尚未详细说明的细节：

If you're reasonably sure that one node can handle processing all the messages, you can set up multiple streams up to 1 stream per partition.如果您有理由确信一个节点可以处理所有消息，您可以设置多个流，每个分区最多 1 个流。

An evolution of this would be to use a CommittablePartitionedSource so that you dynamically create as many streams as there are partitions.一种演变是使用CommittablePartitionedSource以便您动态创建与分区一样多的流。 Note that you'll need to manually commit offsets (eg using Committer.sink ).请注意，您需要手动提交偏移量（例如使用Committer.sink ）。

You can have one stream per instance and deploy up to as many instances as you have partitions;每个实例可以有一个流，并且可以部署与分区一样多的实例； with the same consumer group, the instances will coordinate among themselves the partition assignments.对于相同的消费者组，实例将在它们之间协调分区分配。 When deploying multiple instances, you may or may not need Akka Cluster, depending on the nature of what the actor you're ask ing is doing.在部署多个实例时，您可能需要也可能不需要 Akka 集群，这取决于您ask参与者正在做什么的性质。

If no state is being maintained in the actor per message (note that this would encompass the actor doing a read-modify-write on an external datastore: if you can ensure that the messages affecting a given row are in the same Kafka partition, you might even be able to do without ACID in that external datastore) you likely don't need Akka clustering.如果actor中没有为每条消息维护状态（请注意，这将包括actor在外部数据存储上执行读取-修改-写入：如果您可以确保影响给定行的消息位于同一个Kafka分区中，您甚至可以在该外部数据存储中没有 ACID 的情况下进行），您可能不需要 Akka 集群。

If the actors are themselves stateful (eg they're shadowing some IoT device), then you almost certainly want the combination of Akka Cluster, Akka Cluster Sharding, and Akka Persistence.如果参与者本身是有状态的（例如，他们正在隐藏一些物联网设备），那么您几乎肯定需要 Akka Cluster、Akka Cluster Sharding 和 Akka Persistence 的组合。 Going all the way with this does have some advantages over actors doing read-modify-write on an external datastore (eg most of the reads can be eliminated in favor of tracking state in the actor and embracing event sourcing).与在外部数据存储上执行读取-修改-写入的 Actor 相比，一直采用这种方式确实具有一些优势（例如，可以消除大部分读取以支持跟踪 Actor 中的状态并采用事件源）。

多个 Kafka 分区到 Akka Streams

问题描述

1 个解决方案

解决方案1
0 2019-12-09 17:15:20

多个 Kafka 分区到 Akka Streams

问题描述

1 个解决方案

解决方案1 0 2019-12-09 17:15:20

解决方案1
0 2019-12-09 17:15:20