简体   繁体   English

发送到Kafka REST-Proxy的邮件被“此服务器不是该主题分区的领导者”错误拒绝

[英]Messages sent to Kafka REST-Proxy being rejected by “This server is not the leader for that topic-partition” error

We have been facing some trouble and different understanding between the development team and the environment support team regarding Kafka rest-proxy from the confluent platform. 对于来自融合平台的Kafka rest-proxy,我们的开发团队和环境支持团队之间一直遇到一些麻烦和不同的理解。

First of all, we have an environment of 5 Kafka brokers , with 64 partitions and replication factor of 3 . 首先,我们有5个Kafka代理的环境,具有64个分区3的复制因子

It happens that our calls to rest-proxy are all using the following structure for now: 碰巧,我们对rest-proxy的调用现在都使用以下结构:

curl -X POST \
  http://somehost:8082/topics/test \
  -H 'content-type: application/vnd.kafka.avro.v1+json' \
  -d '{  
   "value_schema_id":1,   
   "records":[  
      { "foo":"bar" }]}'

This kind of call is working for 98.4% of the calls and I noticed that when I try to make this call over 2k times we don't receive any OK response from partition 62 (exactly 1.6% of the partitions). 这种呼叫对98.4%的呼叫有效,我注意到当我尝试进行2k次以上的呼叫时,我们没有收到来自分区62的任何OK响应(恰好是分区的1.6% )。 This error rate used to be 10.9% when we had 7 partitions returning errors right before support team recycled schema-registry. 在支持团队回收架构注册表之前,当我们有7个分区返回错误时,该错误率曾经是10.9%

Now, when the call goes to the partition 62, we receive the following answer: 现在,当呼叫转到分区62时,我们收到以下答复:

{
    "offsets": [
        {
            "partition": null,
            "offset": null,
            "error_code": 50003,
            "error": "This server is not the leader for that topic-partition."
        }
    ],
    "key_schema_id": null,
    "value_schema_id": 1
}

The error is the same when I try to send the messages to the specific partition adding "/partitions/62" to the URL. 当我尝试将消息发送到在URL上添加“ / partitions / 62”的特定分区时,该错误相同。

Support says rest-proxy is not smart enough ( "it's just a proxy" , they say) to elect a valid partition and post it to the leader broker of that partition. 支持人员说rest-proxy不够聪明(他们说, “这只是一个代理” )以选择一个有效的分区并将其发布到该分区的领导经纪人。 They said it randomly selects the partition and then randomly select the broker to post it (which can lead it to post to replicas or even brokers that doesn't have the partition). 他们说,它随机选择分区,然后随机选择要发布的代理(这可能导致它发布到副本或什至没有分区的代理)。 They recommended us to change our calls to get topic metadata before posting the messages and then inform the partition and broker and handle the round-robin assignment on the application side, which doesn't make sense to me. 他们建议我们在发布消息之前更改呼叫以获取主题元数据,然后通知分区和代理并在应用程序端处理循环分配,这对我来说没有意义。

On the Dev side, my understanding is that rest-proxy uses the apache kafka-client to post the messages to the brokers and thus is smart enough to post to the leader broker to the given partition and it also handles the round-robin within the kafka-client lib when the partition is not informed. 在开发方面,我的理解是rest-proxy使用apache kafka-client将消息发布到代理,因此足够聪明,可以将领导者代理发布到给定的分区,并且它还处理不通知分区时的kafka-client lib。 It seems to me like an environment issue related to that partition and not to the call app itself (as it works without problem in other environments with same configuration). 在我看来,这似乎是与该分区有关的环境问题,而不是与呼叫应用程序本身有关的问题(因为在具有相同配置的其他环境中,它可以正常工作)。

To sum up, my questions are: 综上所述,我的问题是:

  1. Am I correct when I say that rest-proxy is smart enough to handle the partition round-robin and posting to the leader? 当我说rest-proxy足够聪明来处理分区循环并发布给领导者时,我是否正确?
  2. Should the application be handling the logic in question 1? 应用程序应该处理问题1的逻辑吗? (I don't see the reason for using rest-proxy instead of kafka-client directly in this case) (在这种情况下,我看不出直接使用rest-proxy而不是kafka-client的原因)
  3. Does it look like a problem in environment orchestration for you too? 对您来说,这在环境业务流程中是否也存在问题?

Hope it all was clear for you to give me some help! 希望您能给我一些帮助很清楚!

Thanks in advance! 提前致谢!

I do not use rest-proxy, but this error likely indicates that NotLeaderForPartitionException happens during the calls. 我不使用rest-proxy,但是此错误可能表明在调用期间发生NotLeaderForPartitionException This error indicates that the leader of the partition has changed but the producer still uses stale metadata. 此错误表明分区的领导者已更改,但生产者仍使用陈旧的元数据。 This error happenned to me when the replication between brokers failed due to internal error in Kafka server. 当代理之间的复制由于Kafka服务器内部错误而失败时,发生了此错误。 This can be checked in the server logs. 可以在服务器日志中检查。

In our case I checked the topic with ./kafka-topics.sh --describe --zookeeper zookeeper_ip:2181 --topic test and it showed that the replicas from one the broker are not in sync (ISR column). 在我们的案例中,我使用./kafka-topics.sh --describe --zookeeper zookeeper_ip:2181 --topic test检查了该主题,它显示来自代理的一个副本没有同步(ISR列)。 Restart of this broker helped, replicas became synchronised and the error dissapeared. 重新启动该代理程序的帮助,副本变得同步,错误消失了。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka:此服务器不是该主题分区的领导者 - Kafka: This server is not the leader for that topic-partition Kafka - 此服务器不是该主题分区的领导者 - Kafka - This server is not the leader for that topic-partition kafka + 此服务器不是该主题分区的领导者 - kafka + This server is not the leader for that topic-partition kafka + 此服务器不是该主题分区的领导者 + kafka 生产者 - kafka + This server is not the leader for that topic-partition + kafka producer 运行Kafka性能流量时出现错误“此服务器不是该主题分区的领导者” - Error “This server is not the leader for that topic-partition” while running Kafka performance traffic org.apache.kafka.common.errors.NotLeaderForPartitionException:此服务器不是该主题分区的领导者 - 继续出现 - org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition - keep appearing Kafka 这个主题分区没有领导者,因为我们正在进行领导选举 - Kafka There is no leader for this topic-partition as we are in the middle of a leadership election KAFKA 1.0-提取主题分区数据的未知错误 - KAFKA 1.0 - Unknown error fetching data for topic-partition spark stream kafka:提取主题分区数据的未知错误 - spark streaming kafka : Unknown error fetching data for topic-partition org.apache.kafka.common.errors.UnknownTopicOrPartitionException:此服务器未托管此主题分区 - org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM