kafka deployment on strimzi

Question

i'm trying to deploy kafka with strimzi, problem is, its exposing kafka brokers as load balancers and assigning them an external IP. i want kafka brokers to be available internally and exposed through a load balancer only. below is my deployment file.

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 3.1.0
    replicas: 2
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: external
        port: 9094
        type: loadbalancer
        tls: false
    config:
      offsets.topic.replication.factor: 2
      transaction.state.log.replication.factor: 2
      transaction.state.log.min.isr: 2
      default.replication.factor: 2
      min.insync.replicas: 2
      inter.broker.protocol.version: "3.1"
    storage:
      type: ephemeral
  zookeeper:
    replicas: 2
    storage:
      type: ephemeral

screenshot of cluster below

as you can see, there are 3 load balancers with external IP's assigned, whereas i wanted it to be one load balancer with an external IP and 2 kafka brokers.

Answer 1

This is because of how Kafka is designed. The clients need to have direct access to each broker in the cluster. So the Load Balancer - while it is convenient to expose the cluster - does not really load-balance anything. It just routes the connection. You can find more details about how and why does it work like this for example in this blog post series: https://strimzi.io/blog/2019/04/17/accessing-kafka-part-1/

Answer 2

Yes, this behavior is correct based on the Kafka discovery protocol. So first let's understand it-

An authenticated Kafka client connects to any of the brokers during the first connection (this is being done using the Kubernetes service)
The broker returns the metadata of one/more topics.
After getting the details of the desired leader partition, the client opens up a new connection to that specific broker. Even if the client needs to connect to the first broker, it terminates the existing connection (#1) and starts a new one with that broker.

Now as we know that the Kafka client directly connects to the broker for sending/receiving the records, the load balancer only comes into the picture for the initial connection and it redirects the client to any one of the available Brokers. Now suppose, if we use the load balancer for subsequent connections as well, what would happen- the load balancer would connect the client to any of the available brokers which might or might not have the partition leader with which the client wants to connect. So Kafka handles this thing using the discovery protocol described above.

kafka deployment on strimzi

Question

2 answers

solution1
2 2022-04-29 11:07:50

solution2
1 ACCPTED 2022-05-18 20:27:57

kafka deployment on strimzi

Question

2 answers

solution1 2 2022-04-29 11:07:50

solution2 1 ACCPTED 2022-05-18 20:27:57

solution1
2 2022-04-29 11:07:50

solution2
1 ACCPTED 2022-05-18 20:27:57