简体   繁体   中英

Kafka Streams with single partition to pause on error

I have a single Kafka broker with single partition. The requirement was to do following:

  1. Read from this partition
  2. Transform message by invoking a REST API
  3. Publish the transformed message to another REST API
  4. Push the response message to another topic

I am using Kafka Streams for achieving this using the following code

StreamsBuilder builder = new StreamsBuilder();`
KStream<Object, Object> consumerStream = builder.stream(kafkaConfiguration.getConsumerTopic());
consumerStream = consumerStream.map(getKeyValueMapper(keyValueMapperClassName));
consumerStream.to(kafkaConfiguration.getProducerTopic(), Produced.with(lStringKeySerde, lAvroValueSerde));
return builder.build();

FOllowing is my configuration:

        streamsConfig.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, String.join(",", bootstrapServers));
        if (schemaRegistry != null && schemaRegistry.length > 0) {
            streamsConfig.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, String.join(",", schemaRegistry));          
        }
        streamsConfig.put(this.keySerializerKeyName, keyStringSerializerClassName);
        streamsConfig.put(this.valueSerialzerKeyName, valueAVROSerializerClassName);
        streamsConfig.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId);
        streamsConfig.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
        streamsConfig.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1000);
        streamsConfig.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, FailOnInvalidTimestamp.class);
        streamsConfig.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, "exactly_once");
        streamsConfig.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 30000);
        streamsConfig.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 1);
        streamsConfig.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 1);
        streamsConfig.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, DeserializationExceptionHandler.class);
        streamsConfig.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG, ProductionExceptionHandler.class);
        streamsConfig.put(StreamsConfig.TOPOLOGY_OPTIMIZATION,StreamsConfig.OPTIMIZE);
        streamsConfig.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, compressionMode);
        streamsConfig.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1000);

I was looking for a mechanism to do the following in my KeyValueMapper:

  1. If any of the REST API is down then I catch the exception
  2. I would like the same offset to be kept on looping until the system is back up OR pause the consumption till the system is back up

I've checked the following links but they do not seem to help.

How to run kafka streams effectively with single app instance and single topic partitions?

Following link talks about KafkaTransactionManager but that would not work I guess the way KStream is initialized above

Kafka transaction failed but commits offset anyway

Any help / pointers in this direction would be much appreciated.

What you want to do is not really supported. Pausing the consumer is not possible in Kafka Streams.

You can "halt" processing only, if you loop withing your KeyValueMapper , however, for this case, the consumer may drop out of the consumer group. For your case, with a single input topic partition and can only have a single thread in a single KafkaStreams instance anyway, hence, it would not affect any other member of the group (as there are none). However, the problem will be that committing the offset would fail after the thread dropped out of the group. Hence, after the thread rejoin the group it would fetch an older offset and reprocess some data (ie, you get duplicate data processing). To avoid dropping out of the consumer group, you could set max.poll.interval.ms config to a high value (maybe even Integer.MAX_VALUE ) though -- given that you have a single member in the consumer group, setting a high value should be ok.

Another alternative might be te use a transform() with a state store. If you cannot make the REST calls, you put the data into the store and retry later. This way the consumer would not drop out of the group. However, reading new data would never stop, and you would need to buffer all data in the store until the REST API can be called again. You should be able to slow down reading new data (to reduce the amount of data you need to buffer) by "sleeping" in your Transformer -- you just need to ensure that you don't violate max.poll.interval.ms config (default is 30 seconds).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM