[英]Kafka Streams with single partition to pause on error
I have a single Kafka broker with single partition.我有一个带有单个分区的 Kafka 代理。 The requirement was to do following:
要求是执行以下操作:
I am using Kafka Streams for achieving this using the following code我正在使用 Kafka Streams 使用以下代码实现此目的
StreamsBuilder builder = new StreamsBuilder();`
KStream<Object, Object> consumerStream = builder.stream(kafkaConfiguration.getConsumerTopic());
consumerStream = consumerStream.map(getKeyValueMapper(keyValueMapperClassName));
consumerStream.to(kafkaConfiguration.getProducerTopic(), Produced.with(lStringKeySerde, lAvroValueSerde));
return builder.build();
FOllowing is my configuration:以下是我的配置:
streamsConfig.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, String.join(",", bootstrapServers));
if (schemaRegistry != null && schemaRegistry.length > 0) {
streamsConfig.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, String.join(",", schemaRegistry));
}
streamsConfig.put(this.keySerializerKeyName, keyStringSerializerClassName);
streamsConfig.put(this.valueSerialzerKeyName, valueAVROSerializerClassName);
streamsConfig.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId);
streamsConfig.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
streamsConfig.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1000);
streamsConfig.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, FailOnInvalidTimestamp.class);
streamsConfig.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, "exactly_once");
streamsConfig.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 30000);
streamsConfig.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 1);
streamsConfig.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 1);
streamsConfig.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, DeserializationExceptionHandler.class);
streamsConfig.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG, ProductionExceptionHandler.class);
streamsConfig.put(StreamsConfig.TOPOLOGY_OPTIMIZATION,StreamsConfig.OPTIMIZE);
streamsConfig.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, compressionMode);
streamsConfig.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1000);
I was looking for a mechanism to do the following in my KeyValueMapper:我正在寻找一种机制来在我的 KeyValueMapper 中执行以下操作:
I've checked the following links but they do not seem to help.我检查了以下链接,但它们似乎没有帮助。
How to run kafka streams effectively with single app instance and single topic partitions? 如何使用单个应用程序实例和单个主题分区有效运行 kafka 流?
Following link talks about KafkaTransactionManager but that would not work I guess the way KStream is initialized above以下链接讨论了 KafkaTransactionManager 但这行不通我猜上面 KStream 的初始化方式
Kafka transaction failed but commits offset anyway 卡夫卡交易失败,但无论如何提交抵消
Any help / pointers in this direction would be much appreciated.在这个方向上的任何帮助/指针将不胜感激。
What you want to do is not really supported.你想做的事情并没有得到真正的支持。 Pausing the consumer is not possible in Kafka Streams.
在 Kafka Streams 中无法暂停消费者。
You can "halt" processing only, if you loop withing your KeyValueMapper
, however, for this case, the consumer may drop out of the consumer group.您只能“停止”处理,如果您使用
KeyValueMapper
循环,但是,在这种情况下,消费者可能会退出消费者组。 For your case, with a single input topic partition and can only have a single thread in a single KafkaStreams
instance anyway, hence, it would not affect any other member of the group (as there are none).对于您的情况,具有单个输入主题分区并且无论如何只能在单个
KafkaStreams
实例中具有单个线程,因此,它不会影响组的任何其他成员(因为没有)。 However, the problem will be that committing the offset would fail after the thread dropped out of the group.然而,问题将是在线程退出组后提交偏移量将失败。 Hence, after the thread rejoin the group it would fetch an older offset and reprocess some data (ie, you get duplicate data processing).
因此,在线程重新加入组后,它将获取较旧的偏移量并重新处理一些数据(即,您将获得重复的数据处理)。 To avoid dropping out of the consumer group, you could set
max.poll.interval.ms
config to a high value (maybe even Integer.MAX_VALUE
) though -- given that you have a single member in the consumer group, setting a high value should be ok.为避免退出消费者组,您可以将
max.poll.interval.ms
配置设置为较高的值(甚至可能是Integer.MAX_VALUE
)——假设您在消费者组中有一个成员,则设置一个较高的值应该可以。
Another alternative might be te use a transform()
with a state store.另一种选择可能是使用带有状态存储的
transform()
。 If you cannot make the REST calls, you put the data into the store and retry later.如果您无法进行 REST 调用,则将数据放入存储区并稍后重试。 This way the consumer would not drop out of the group.
这样消费者就不会退出组。 However, reading new data would never stop, and you would need to buffer all data in the store until the REST API can be called again.
但是,读取新数据永远不会停止,您需要缓冲存储中的所有数据,直到可以再次调用 REST API。 You should be able to slow down reading new data (to reduce the amount of data you need to buffer) by "sleeping" in your
Transformer
-- you just need to ensure that you don't violate max.poll.interval.ms
config (default is 30 seconds).您应该能够通过在
Transformer
“休眠”来减慢读取新数据的速度(以减少需要缓冲的数据量)——您只需要确保不违反max.poll.interval.ms
配置(默认为 30 秒)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.