简体   繁体   中英

Messages sent to Kafka REST-Proxy being rejected by “This server is not the leader for that topic-partition” error

We have been facing some trouble and different understanding between the development team and the environment support team regarding Kafka rest-proxy from the confluent platform.

First of all, we have an environment of 5 Kafka brokers , with 64 partitions and replication factor of 3 .

It happens that our calls to rest-proxy are all using the following structure for now:

curl -X POST \
  http://somehost:8082/topics/test \
  -H 'content-type: application/vnd.kafka.avro.v1+json' \
  -d '{  
   "value_schema_id":1,   
   "records":[  
      { "foo":"bar" }]}'

This kind of call is working for 98.4% of the calls and I noticed that when I try to make this call over 2k times we don't receive any OK response from partition 62 (exactly 1.6% of the partitions). This error rate used to be 10.9% when we had 7 partitions returning errors right before support team recycled schema-registry.

Now, when the call goes to the partition 62, we receive the following answer:

{
    "offsets": [
        {
            "partition": null,
            "offset": null,
            "error_code": 50003,
            "error": "This server is not the leader for that topic-partition."
        }
    ],
    "key_schema_id": null,
    "value_schema_id": 1
}

The error is the same when I try to send the messages to the specific partition adding "/partitions/62" to the URL.

Support says rest-proxy is not smart enough ( "it's just a proxy" , they say) to elect a valid partition and post it to the leader broker of that partition. They said it randomly selects the partition and then randomly select the broker to post it (which can lead it to post to replicas or even brokers that doesn't have the partition). They recommended us to change our calls to get topic metadata before posting the messages and then inform the partition and broker and handle the round-robin assignment on the application side, which doesn't make sense to me.

On the Dev side, my understanding is that rest-proxy uses the apache kafka-client to post the messages to the brokers and thus is smart enough to post to the leader broker to the given partition and it also handles the round-robin within the kafka-client lib when the partition is not informed. It seems to me like an environment issue related to that partition and not to the call app itself (as it works without problem in other environments with same configuration).

To sum up, my questions are:

  1. Am I correct when I say that rest-proxy is smart enough to handle the partition round-robin and posting to the leader?
  2. Should the application be handling the logic in question 1? (I don't see the reason for using rest-proxy instead of kafka-client directly in this case)
  3. Does it look like a problem in environment orchestration for you too?

Hope it all was clear for you to give me some help!

Thanks in advance!

I do not use rest-proxy, but this error likely indicates that NotLeaderForPartitionException happens during the calls. This error indicates that the leader of the partition has changed but the producer still uses stale metadata. This error happenned to me when the replication between brokers failed due to internal error in Kafka server. This can be checked in the server logs.

In our case I checked the topic with ./kafka-topics.sh --describe --zookeeper zookeeper_ip:2181 --topic test and it showed that the replicas from one the broker are not in sync (ISR column). Restart of this broker helped, replicas became synchronised and the error dissapeared.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM