简体   繁体   English

添加空处理器时,Kafka Streams的处理速度变慢

[英]Kafka Streams processing speed slows when empty processor is added

Consider this Kafka Streams driver 考虑此Kafka Streams驱动程序

public class TestDriver {

    private static final String SOURCE = "SOURCE";

    public static void main(String[] args) throws Exception {

        ProtoDeserializer<Message> protoDeserializer = new ProtoDeserializer<>(Message.parser());
        ProtoSerializer<Message> protoSerializer = new ProtoSerializer<>();

        StringDeserializer stringDerializer = new StringDeserializer();
        StringSerializer stringSerializer = new StringSerializer();

        Topology topologyBuilder = new Topology();
        topologyBuilder.addSource(SOURCE, stringDerializer, protoDeserializer, "input-messages")

            .addProcessor(DummyProcessor.NAME, DummyProcessor::new, SOURCE)

            .addSink("MAIN", "output-messages", stringSerializer, protoSerializer, DummyProcessor.NAME)
        ;

        KafkaStreams streams = new KafkaStreams(topologyBuilder, getConfig());
        streams.cleanUp();
        streams.start();

        System.out.println(streams.toString());

        Runtime.getRuntime().addShutdownHook(new Thread(streams::close));

    }

    private static Properties getConfig() {
        Properties config = new Properties();
        config.put(StreamsConfig.CLIENT_ID_CONFIG, "test.stream-processor");
        config.put(StreamsConfig.APPLICATION_ID_CONFIG, "test.stream-processor");
        config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "broker-1:9092,broker-2:9092,broker-3:9092");
        config.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 3);
        config.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 10);
        config.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, WallclockTimestampExtractor.class);
        return config;
    }
}

The question is that when no processor is added to the topology (no .addProcessor() is included), the processing speed from source to sink is fine (means that I currently produce 25k messages/s and it has no problem to catch up). 问题是,当没有处理器添加到拓扑中(不包括.addProcessor() )时,从源到接收器的处理速度都很好(意味着我目前每秒产生25k消息,并且没有问题可以追赶) 。

However, when DummyProcessor is added, it suddenly processes 3k messages/s max (600k bytes). 但是,当添加DummyProcessor时,它突然处理最大3k消息/秒(600k字节)。

DummyProcessor does basically nothing: DummyProcessor基本上不执行任何操作:

public class DummyProcessor extends AbstractProcessor<String, Message> {

    public static final String NAME = "DUMMY_PROCESSOR";

    public void process(String key, Message originalMessage) {
        context().forward(key, originalMessage);
        context().commit();
    }
}

Is adding single "empty" processor such overhead for Streams performance? 添加单个“空”处理器是否会增加Streams性能的开销? What is the cause of it? 是什么原因造成的? Is Kafka Streams so smart that when there's not processor it doesn't perform protobuf serde and only forwards data received? Kafka Streams是否如此智能,以至于当没有处理器时,它不会执行protobuf serde,而只会转发收到的数据? Anyway to speed it up? 无论如何要加快速度?

With such speed I'd need like x thousands more cpu threads available to be able to process all my data as 25k messages/s is 1 % of what I have. 以这样的速度,我需要多出数千个cpu线程来处理我的所有数据,因为25k消息/秒是我拥有的1%。 That sounds a lot. 这听起来很多。

Issue is caused because of requesting commit to often. 由于经常请求提交而导致问题。

You don't need to call ProcessorContext:commit() at all. 您根本不需要调用ProcessorContext:commit() Kafka Streams based on commit.interval.ms property performs commit (be default: 30000 ms ). 基于commit.interval.ms属性的Kafka Streams执行提交(默认值: 30000 ms )。 If exactly once semantic is set its different value. 如果只设置一次语义,则其值将不同。 You can details in https://kafka.apache.org/documentation/#streamsconfigs . 您可以在https://kafka.apache.org/documentation/#streamsconfigs中详细了解。

If in some use case you need commit more frequently you can call ProcessorContext:commit() . 如果在某些用例中需要更频繁地提交,则可以调用ProcessorContext:commit() But you have to remember, that commit is not made Immediately (directly). 但是您必须记住,该提交不是立即(直接)进行的。 It sets only flag to commit as soon it will be possible. 它将仅标志设置为尽快提交。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM