简体   繁体   English

具有单个分区的 Kafka Streams 可在出错时暂停

[英]Kafka Streams with single partition to pause on error

I have a single Kafka broker with single partition.我有一个带有单个分区的 Kafka 代理。 The requirement was to do following:要求是执行以下操作:

  1. Read from this partition从此分区读取
  2. Transform message by invoking a REST API通过调用 REST API 转换消息
  3. Publish the transformed message to another REST API将转换后的消息发布到另一个 REST API
  4. Push the response message to another topic将响应消息推送到另一个主题

I am using Kafka Streams for achieving this using the following code我正在使用 Kafka Streams 使用以下代码实现此目的

StreamsBuilder builder = new StreamsBuilder();`
KStream<Object, Object> consumerStream = builder.stream(kafkaConfiguration.getConsumerTopic());
consumerStream = consumerStream.map(getKeyValueMapper(keyValueMapperClassName));
consumerStream.to(kafkaConfiguration.getProducerTopic(), Produced.with(lStringKeySerde, lAvroValueSerde));
return builder.build();

FOllowing is my configuration:以下是我的配置:

        streamsConfig.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, String.join(",", bootstrapServers));
        if (schemaRegistry != null && schemaRegistry.length > 0) {
            streamsConfig.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, String.join(",", schemaRegistry));          
        }
        streamsConfig.put(this.keySerializerKeyName, keyStringSerializerClassName);
        streamsConfig.put(this.valueSerialzerKeyName, valueAVROSerializerClassName);
        streamsConfig.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId);
        streamsConfig.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
        streamsConfig.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1000);
        streamsConfig.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, FailOnInvalidTimestamp.class);
        streamsConfig.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, "exactly_once");
        streamsConfig.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 30000);
        streamsConfig.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 1);
        streamsConfig.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 1);
        streamsConfig.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, DeserializationExceptionHandler.class);
        streamsConfig.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG, ProductionExceptionHandler.class);
        streamsConfig.put(StreamsConfig.TOPOLOGY_OPTIMIZATION,StreamsConfig.OPTIMIZE);
        streamsConfig.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, compressionMode);
        streamsConfig.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1000);

I was looking for a mechanism to do the following in my KeyValueMapper:我正在寻找一种机制来在我的 KeyValueMapper 中执行以下操作:

  1. If any of the REST API is down then I catch the exception如果任何 REST API 关闭,那么我会捕获异常
  2. I would like the same offset to be kept on looping until the system is back up OR pause the consumption till the system is back up我希望相同的偏移量保持循环直到系统备份或暂停消耗直到系统备份

I've checked the following links but they do not seem to help.我检查了以下链接,但它们似乎没有帮助。

How to run kafka streams effectively with single app instance and single topic partitions? 如何使用单个应用程序实例和单个主题分区有效运行 kafka 流?

Following link talks about KafkaTransactionManager but that would not work I guess the way KStream is initialized above以下链接讨论了 KafkaTransactionManager 但这行不通我猜上面 KStream 的初始化方式

Kafka transaction failed but commits offset anyway 卡夫卡交易失败,但无论如何提交抵消

Any help / pointers in this direction would be much appreciated.在这个方向上的任何帮助/指针将不胜感激。

What you want to do is not really supported.你想做的事情并没有得到真正的支持。 Pausing the consumer is not possible in Kafka Streams.在 Kafka Streams 中无法暂停消费者。

You can "halt" processing only, if you loop withing your KeyValueMapper , however, for this case, the consumer may drop out of the consumer group.您只能“停止”处理,如果您使用KeyValueMapper循环,但是,在这种情况下,消费者可能会退出消费者组。 For your case, with a single input topic partition and can only have a single thread in a single KafkaStreams instance anyway, hence, it would not affect any other member of the group (as there are none).对于您的情况,具有单个输入主题分区并且无论如何只能在单个KafkaStreams实例中具有单个线程,因此,它不会影响组的任何其他成员(因为没有)。 However, the problem will be that committing the offset would fail after the thread dropped out of the group.然而,问题将是在线程退出组后提交偏移量将失败。 Hence, after the thread rejoin the group it would fetch an older offset and reprocess some data (ie, you get duplicate data processing).因此,在线程重新加入组后,它将获取较旧的偏移量并重新处理一些数据(即,您将获得重复的数据处理)。 To avoid dropping out of the consumer group, you could set max.poll.interval.ms config to a high value (maybe even Integer.MAX_VALUE ) though -- given that you have a single member in the consumer group, setting a high value should be ok.为避免退出消费者组,您可以将max.poll.interval.ms配置设置为较高的值(甚至可能是Integer.MAX_VALUE )——假设您在消费者组中有一个成员,则设置一个较高的值应该可以。

Another alternative might be te use a transform() with a state store.另一种选择可能是使用带有状态存储的transform() If you cannot make the REST calls, you put the data into the store and retry later.如果您无法进行 REST 调用,则将数据放入存储区并稍后重试。 This way the consumer would not drop out of the group.这样消费者就不会退出组。 However, reading new data would never stop, and you would need to buffer all data in the store until the REST API can be called again.但是,读取新数据永远不会停止,您需要缓冲存储中的所有数据,直到可以再次调用 REST API。 You should be able to slow down reading new data (to reduce the amount of data you need to buffer) by "sleeping" in your Transformer -- you just need to ensure that you don't violate max.poll.interval.ms config (default is 30 seconds).您应该能够通过在Transformer “休眠”来减慢读取新数据的速度(以减少需要缓冲的数据量)——您只需要确保不违反max.poll.interval.ms配置(默认为 30 秒)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM