[英]Guidelines to handle Timeout exception for Kafka Producer?
I often get Timeout exceptions due to various reasons in my Kafka producer.由于我的 Kafka 制作人的各种原因,我经常收到超时异常。 I am using all the default values for producer config currently.
我目前正在使用生产者配置的所有默认值。
I have seen following Timeout exceptions:我见过以下超时异常:
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
org.apache.kafka.common.errors.TimeoutException:60000 毫秒后无法更新元数据。
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for topic-1-0: 30001 ms has passed since last append
org.apache.kafka.common.errors.TimeoutException:主题 1-0 的 1 条记录到期:自上次追加以来已过去 30001 毫秒
I have following questions:我有以下问题:
What are the general causes of these Timeout exceptions?这些超时异常的一般原因是什么?
what are the general guidelines to handling the Timeout exception?处理超时异常的一般准则是什么?
Are Timeout exceptions retriable exceptions and is it safe to retry them?超时异常是可重试的异常吗?重试它们是否安全?
I am using Kafka v2.1.0 and Java 11.我正在使用 Kafka v2.1.0 和 Java 11。
Thanks in advance.提前致谢。
"What are the general causes of these Timeout exceptions?" “这些超时异常的一般原因是什么?”
The most common cause that I saw earlier was due to staled metadata information: one broker went down, and the topic partitions on that broker were failed over to other brokers.我之前看到的最常见的原因是由于元数据信息过时:一个代理出现故障,并且该代理上的主题分区故障转移到其他代理。 However, the topic metadata information has not been updated properly, and the client still tries to talk to the failed broker to either get metadata info, or to publish the message.
但是,主题元数据信息尚未正确更新,客户端仍会尝试与失败的代理通信以获取元数据信息或发布消息。 That causes timeout exception.
这会导致超时异常。
Netwowrk connectivity issues.网络连接问题。 This can be easily diagnosed with
telnet broker_host borker_port
这可以通过
telnet broker_host borker_port
轻松诊断
The broker is overloaded.经纪人超载。 This can happen if the broker is saturated with high workload, or hosts too many topic partitions.
如果代理因高工作负载而饱和,或者托管太多主题分区,就会发生这种情况。
To handle the timeout exceptions, the general practice is:处理超时异常,一般的做法是:
Rule out broker side issues.排除经纪人方面的问题。 make sure that the topic partitions are fully replicated, and the brokers are not overloaded
确保主题分区完全复制,并且代理没有过载
Fix host name resolution or network connectivity issues if there are any修复主机名解析或网络连接问题(如果有)
Tune parameters such as request.timeout.ms
, delivery.timeout.ms
etc. My past experience was that the default value works fine in most of the cases.调整
request.timeout.ms
、 delivery.timeout.ms
等参数。我过去的经验是默认值在大多数情况下都能正常工作。
The default Kafka config values, both for producers and brokers, are conservative enough that, under general circumstances, you shouldn't run into any timeouts.生产者和代理的默认 Kafka 配置值足够保守,在一般情况下,您不应该遇到任何超时。 Those problems typically point to a flaky/lossy network between the producer and the brokers.
这些问题通常指向生产者和经纪人之间的脆弱/有损网络。
The exception you're getting, Failed to update metadata
, usually means one of the brokers is not reachable by the producer, and the effect is that it cannot get the metadata.你得到的例外,
Failed to update metadata
,通常意味着生产者无法访问其中一个经纪人,结果是它无法获得元数据。
For your second question, Kafka will automatically retry to send messages that were not fully ack'ed by the brokers.对于您的第二个问题,Kafka 将自动重试发送代理未完全确认的消息。 It's up to you if you want to catch and retry when you get a timeout on the application side, but if you're hitting 1+ min timeouts, retrying is probably not going to make much of a difference.
如果您想在应用程序端超时时捕获并重试,这取决于您,但如果您达到 1 分钟以上的超时,重试可能不会产生太大影响。 You're going to have to figure out the underlying network/reachability problems with the brokers anyway.
无论如何,您将不得不弄清楚代理的底层网络/可达性问题。
In my experience, usually the network problems are:根据我的经验,通常网络问题是:
nc -z broker-ip 9092
from the server running the producer)nc -z broker-ip 9092
)I suggest to use the following properties while constructing Producer config我建议在构建 Producer config 时使用以下属性
kafka.acks=1卡夫卡.acks=1
kafka.retries=3 kafka.retries=3
timeout.ms=200超时.ms=200
retry.backoff.ms=50 retry.backoff.ms=50
dataLogger.kafka.delivery.timeout.ms=1200 dataLogger.kafka.delivery.timeout.ms=1200
producer.send(record, new Callback {
override def onCompletion(recordMetadata: RecordMetadata, e: Exception): Unit = {
if (e != null) {
logger.debug(s"KafkaLogger : Message Sent $record to Topic ${recordMetadata.topic()}, Partition ${recordMetadata.partition()} , Offset ${recordMetadata.offset()} ")
} else {
logger.error(s"Exception while sending message $item to Error topic :$e")
}
}
})
producer.close(1000, TimeUnit.MILLISECONDS) producer.close(1000, TimeUnit.MILLISECONDS)
The Timeout Exception would happen if the value of "advertised.listeners"(protocol://host:port) is not reachable by the producer or consumer如果生产者或消费者无法访问“advertised.listeners”(protocol://host:port)的值,则会发生超时异常
check the configuration of property "advertised.listeners" by the following command:通过以下命令检查属性“advertised.listeners”的配置:
cat $KAFKA_HOME/config/server.properties
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.