简体   繁体   English

根据 Consumer.committablePartitionedSource 中分配的分区数调整并行度

[英]Adjusting parallism based on number of partitions assigned in Consumer.committablePartitionedSource

I am trying to use Consumer.committablePartitionedSource() and creating stream per partition as shown below我正在尝试使用Consumer.committablePartitionedSource()并为每个分区创建 stream ,如下所示

    public void setup() {
        control = Consumer.committablePartitionedSource(consumerSettings,
                Subscriptions.topics("chat").withPartitionAssignmentHandler(new PartitionAssignmentListener()))
                .mapAsyncUnordered(Integer.MAX_VALUE, pair -> setupSource(pair, committerSettings))
                .toMat(Sink.ignore(), Consumer::createDrainingControl)
                .run(Materializer.matFromSystem(actorSystem));
    }

    private CompletionStage<Done> setupSource(Pair<TopicPartition, Source<ConsumerMessage.CommittableMessage<String, String>, NotUsed>> pair, CommitterSettings committerSettings) {
        LOGGER.info("SETTING UP PARTITION-{} SOURCE", pair.first().partition());
        return pair.second().mapAsync(16, msg -> CompletableFuture.supplyAsync(() -> consumeMessage(msg), actorSystem.dispatcher())
                .thenApply(param -> msg.committableOffset()))
                .withAttributes(ActorAttributes.supervisionStrategy(ex -> Supervision.restart()))
                .runWith(Committer.sink(committerSettings), Materializer.matFromSystem(actorSystem));
    }

While setting up the source per partition I am using parallelism which I want to change based on no of partitions assigned to the node.在为每个分区设置源时,我正在使用并行性,我想根据分配给节点的分区数来更改它。 That I can do that in the first assignment of partitions to the node.我可以在第一次为节点分配分区时做到这一点。 But as new nodes join the cluster assigned partitions are revoked and assigned.但是随着新节点加入集群,分配的分区被撤销和分配。 This time stream not emitting already existing partitions(due to kafka cooperative rebalancing protocol) to reconfigure parallelism.这次 stream 没有发出已经存在的分区(由于 kafka 协作重新平衡协议)来重新配置并行性。

Here I am sharing the same dispatcher across all sources and if I keep the same parallelism on rebalancing I feel the fair chance to each partition message processing is not possible.在这里,我在所有源中共享相同的调度程序,如果我在重新平衡时保持相同的并行性,我觉得每个分区消息处理的公平机会是不可能的。 Am I correct?我对么? Please correct me请纠正我

If I understand you correctly you want to have a fixed parallelism across dynamically changing number of Source s that come and go as Kafka is rebalancing topic partitions.如果我理解正确,您希望在动态变化的Source数量和 go 之间具有固定的并行性,因为 Kafka 正在重新平衡主题分区。

Have a look at first example in the Alpakka Kafka documentation here .此处查看Alpakka Kafka 文档中的第一个示例。 It can be adjusted to your example like this:它可以像这样调整到您的示例:

 Consumer.DrainingControl<Done> control =
      Consumer.committablePartitionedSource(consumerSettings, Subscriptions.topics("chat"))
              .wireTap(p -> LOGGER.info("SETTING UP PARTITION-{} SOURCE", p.first().partition()))
              .flatMapMerge(Integer.MAX_VALUE, Pair::second)
              .mapAsync(
                16,
                msg -> CompletableFuture
                         .supplyAsync(() -> consumeMessage(msg),
                                      actorSystem.dispatcher())
                         .thenApply(param -> msg.committableOffset()))
              .withAttributes(
                ActorAttributes.supervisionStrategy(
                  ex -> Supervision.restart()))
              .toMat(Committer.sink(committerSettings), Consumer::createDrainingControl)
              .run(Materializer.matFromSystem(actorSystem));

So basically the Consumer.committablePartitionedSource() will emit a Source anytime Kafka assigns partition to this consumer and will terminate such Source when previously assigned partition is rebalanced and taken away from this consumer.因此,基本上Consumer.committablePartitionedSource()将在 Kafka 将分区分配给该消费者时发出一个Source ,并在先前分配的分区被重新平衡并从该消费者中移除时终止此类Source

The flatMapMerge will take those Source s and merge the messages they output. flatMapMerge将采用这些Source并合并它们 output 的消息。

All those messages will compete in the mapAsync stage to get processed.所有这些消息都将在mapAsync阶段竞争以得到处理。 The fairness of this competing is really down to the flatMapMerge above that should give equal chance for all the Source s to emit their messages.这种竞争的公平性实际上取决于上面的flatMapMerge ,它应该为所有Source提供平等的机会来发出它们的消息。 Regardless of how many Source s are outputing messages, they will all share a fixed parallelism here, which I believe is what you're after.无论有多少Source正在输出消息,它们都将在这里共享一个固定的并行度,我相信这就是您所追求的。

All those messages eventually get to the Commiter.sink that handles offset committing.所有这些消息最终都会到达处理偏移提交的Commiter.sink

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM