简体   繁体   中英

Sending messages to Kafka unbuffered using kafkacat

I have single node Kafka instance running locally via docker-compose.
(system: Mac/Arm64, image: wurstmeister/kafka:2.13-2.6.0)

I want to use kafkacat ( kcat installed via Homebrew) to instantly produce and consume messages to and from Kafka.

Here is a minimal script:

#!/usr/bin/env bash

NUM_MESSAGES=${1:-3}  # use arg1 or use default=3
KCAT_ARGS="-q -u -c $NUM_MESSAGES -b localhost:9092 -t unbuffered"

log() { echo "$*" 1>&2; }

producer() {
    log "starting producer"
    for i in `seq 1 3`; do
        echo "msg $i"
        log "produced: msg $i"
        sleep 1
    done | kcat $KCAT_ARGS -P
}

consumer() {
    log "starting consumer"
    kcat $KCAT_ARGS -C -o end | while read line; do
        log "consumed: $line"
    done
}

producer&
consumer&
wait

I would expect (roughly) the following output:

starting producer
starting consumer
produced: msg 1
consumed: msg 1
produced: msg 2
consumed: msg 2
produced: msg 3
consumed: msg 3

However, I only get output with produced and consumed messages fully batched into two groups, even though both the consumer and producer are running in parallel:

starting producer
starting consumer
produced: msg 1
produced: msg 2
produced: msg 3
consumed: msg 1
consumed: msg 2
consumed: msg 3

Here are some kafkacat/kafka producer properties and the values I already tried to change the producer behavior.

# kcat options having no effect on the test case
-u  # unbuffered output
-T  # act like `tee` and echo input

# kafka properties having no effect on the test case
-X queue.buffering.max.messages=1
-X queue.buffering.max.kbytes=1
-X batch.num.messages=1
-X queue.buffering.max.ms=100
-X socket.timeout.ms=100
-X max.in.flight.requests.per.connection=1
-X auto.commit.interval.ms=100
-X request.timeout.ms=100
-X message.timeout.ms=100
-X offset.store.sync.interval.ms=1
-X message.copy.max.bytes=100
-X socket.send.buffer.bytes=100
-X linger.ms=1
-X delivery.timeout.ms=100

None of the options above had any effect on the pipeline.

What am I missing?

Edit : It seems to be a flushing issue with either kcat or librdkafka. Maybe the -X properties are not used correctly.

Here are the current observations (will edit them as I learn more):

  • When sending a larger payload of 10000 messages with a smaller delay in the script, kcat will produce several batches of messages. It seems to be size-based, but not configurable by any of the -X options.

  • The batches are then also correctly picked up by the consumer. So it must be a producer issue .

  • I also tried the script in docker with the current kafkacat from the apline repos. This one seems to flush a but earlier; with less data needed to fill the "hidden" buffer. The -X options also had no effect.

  • Also the -X properties seem to be checked. If I set out-of-range values, kcat (or maybe librdkafka) will complain. However, setting low values for any of the timeout and buffer size values has no effect.

  • When calling kcat for every message (which is a bit of an overkill), the messages are produced instantly.

The question remains:

How do I tell a Kafka-pipeline to instantly produce my first message?

If you have an example in Go, this would also help, since I am having similar observations with a small Go program using kafka-go . I may post a separate question if I can strip that down to a postable format.

UPDATE : I tried using a bitnami image on a pure Linux host. Producing and consuming via kafkacat works as expected on this system. I will post an answer once I know more.

Here is how I solved the problem.

The issue was not in the Kafka docker images. They all work as expected, although I was able to crash the Java-based Kafkas by just firing up kcat against them. I later added rpk (RedPanda, a non-Java "Kafka"), which was much more stable in my single node setup.

Findings

  • Using kcat I did not find any way of producing messages instantly without buffering. It notoriously ignores all -X args. (edenhill/kcat Version 1.7.0, MacOS, Arm64)
  • Sending single messages works. When closing the input pipe, kcat will flush the output buffer.
  • Consuming messages instantly via kcat is possible and works by default.
  • Other Kafka clients do not have this issue. I created a small kafka-go example that just works as expected; no extensive buffering by default.

Conculsion

  • Do not use kcat to produce messages via long-running pipes.
  • Use kafka-go or a similar client event for small health checks and other "scripts".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM