简体   繁体   中英

Azure eventhub Kafka org.apache.kafka.common.errors.TimeoutException for some of the records

Have a ArrayList containing 80 to 100 records trying to stream and send each individual record(POJO,not entire list) to Kafka topic (event hub). Scheduled a cron job like every hour to send these records(POJO) to event hub.

Able to see messages being sent to eventhub,but after 3 to 4 successful run getting following exception (which includes several messages being sent and several failing with below exception)

    Expiring 14 record(s) for eventhubname: 30125  ms has passed since batch creation plus linger time

Following is the config for Producer used,

    props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
    props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
    props.put(ProducerConfig.ACKS_CONFIG, "1");
    props.put(ProducerConfig.RETRIES_CONFIG, "3");

Message Retention period - 7 Partition - 6 using spring Kafka(2.2.3) to send the events method marked as @Async where kafka send is written

    @Async
    protected void send() {
       kafkatemplate.send(record);
    }

Expected - No exception to be thrown from kafka Actual - org.apache.kafka.common.errors.TimeoutException is been thrown

Prakash - we have seen a number of issues where spiky producer patterns see batch timeout.

The problem here is that the producer has two TCP connections that can go idle for > 4 mins - at that point, Azure load balancers close out the idle connections. The Kafka client is unaware that the connections have been closed so it attempts to send a batch on a dead connection, which times out, at which point retry kicks in.

  • Set connections.max.idle.ms to < 4mins – this allows Kafka client's network client layer to gracefully handle connection close for the producer's message-sending TCP connection
  • Set metadata.max.age.ms to < 4mins – this is effectively a keep-alive for the producer metadata TCP connection

Feel free to reach out to the EH product team on Github, we are fairly good about responding to issues - https://github.com/Azure/azure-event-hubs-for-kafka

This exception indicates you are queueing records at a faster rate than they can be sent. Once a record is added a batch, there is a time limit for sending that batch to ensure it has been sent within a specified duration. This is controlled by the Producer configuration parameter, request.timeout.ms. If the batch has been queued longer than the timeout limit, the exception will be thrown. Records in that batch will be removed from the send queue.

Please check the below for similar issue, this might help better.

Kafka producer TimeoutException: Expiring 1 record(s)

you can also check this link

when-does-the-apache-kafka-client-throw-a-batch-expired-exception/34794261#34794261 for reason more details about batch expired exception .

Also implement proper retry policy.

Note this does not account any network issues scanner side. With network issues you will not be able to send to either hub.

Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM