使用kafkacat向无缓冲的Kafka发送消息

Question

I have single node Kafka instance running locally via docker-compose.我有一个通过 docker-compose 在本地运行的单节点 Kafka 实例。
(system: Mac/Arm64, image: wurstmeister/kafka:2.13-2.6.0) （系统：Mac/Arm64，图像：wurstmeister/kafka:2.13-2.6.0）

I want to use kafkacat ( kcat installed via Homebrew) to instantly produce and consume messages to and from Kafka.我想使用kafkacat （通过 Homebrew 安装的kcat ）即时生成和使用来自 Kafka 的消息。

Here is a minimal script:这是一个最小的脚本：

#!/usr/bin/env bash

NUM_MESSAGES=${1:-3}  # use arg1 or use default=3
KCAT_ARGS="-q -u -c $NUM_MESSAGES -b localhost:9092 -t unbuffered"

log() { echo "$*" 1>&2; }

producer() {
    log "starting producer"
    for i in `seq 1 3`; do
        echo "msg $i"
        log "produced: msg $i"
        sleep 1
    done | kcat $KCAT_ARGS -P
}

consumer() {
    log "starting consumer"
    kcat $KCAT_ARGS -C -o end | while read line; do
        log "consumed: $line"
    done
}

producer&
consumer&
wait

I would expect (roughly) the following output:我希望（大致）以下 output：

starting producer
starting consumer
produced: msg 1
consumed: msg 1
produced: msg 2
consumed: msg 2
produced: msg 3
consumed: msg 3

However, I only get output with produced and consumed messages fully batched into two groups, even though both the consumer and producer are running in parallel:但是，我只得到 output，其中生产消息和消费消息完全分为两组，即使consumer和producer都在并行运行：

starting producer
starting consumer
produced: msg 1
produced: msg 2
produced: msg 3
consumed: msg 1
consumed: msg 2
consumed: msg 3

Here are some kafkacat/kafka producer properties and the values I already tried to change the producer behavior.这里有一些 kafkacat/kafka 生产者属性和我已经尝试改变生产者行为的值。

# kcat options having no effect on the test case
-u  # unbuffered output
-T  # act like `tee` and echo input

# kafka properties having no effect on the test case
-X queue.buffering.max.messages=1
-X queue.buffering.max.kbytes=1
-X batch.num.messages=1
-X queue.buffering.max.ms=100
-X socket.timeout.ms=100
-X max.in.flight.requests.per.connection=1
-X auto.commit.interval.ms=100
-X request.timeout.ms=100
-X message.timeout.ms=100
-X offset.store.sync.interval.ms=1
-X message.copy.max.bytes=100
-X socket.send.buffer.bytes=100
-X linger.ms=1
-X delivery.timeout.ms=100

None of the options above had any effect on the pipeline.上述选项均未对管道产生任何影响。

What am I missing?我错过了什么？

Edit : It seems to be a flushing issue with either kcat or librdkafka.编辑：这似乎是 kcat 或 librdkafka 的冲洗问题。 Maybe the -X properties are not used correctly.也许-X属性没有正确使用。

Here are the current observations (will edit them as I learn more):以下是当前的观察结果（当我了解更多信息时将对其进行编辑）：

When sending a larger payload of 10000 messages with a smaller delay in the script, kcat will produce several batches of messages.当在脚本中以较小的延迟发送 10000 条消息的较大负载时， kcat将生成几批消息。 It seems to be size-based, but not configurable by any of the -X options.它似乎是基于大小的，但不能通过任何-X选项进行配置。
The batches are then also correctly picked up by the consumer.然后，消费者也可以正确地提取这些批次。 So it must be a producer issue .所以这一定是生产者的问题。
I also tried the script in docker with the current kafkacat from the apline repos.我还使用 apline 回购中的当前kafkacat尝试了 docker 中的脚本。 This one seems to flush a but earlier;这个好像冲的比较早； with less data needed to fill the "hidden" buffer.填充“隐藏”缓冲区所需的数据更少。 The -X options also had no effect. -X选项也没有效果。
Also the -X properties seem to be checked.似乎还检查了-X属性。 If I set out-of-range values, kcat (or maybe librdkafka) will complain.如果我设置了超出范围的值，kcat（或者 librdkafka）会抱怨。 However, setting low values for any of the timeout and buffer size values has no effect.但是，为任何超时和缓冲区大小值设置较低的值都没有效果。
When calling kcat for every message (which is a bit of an overkill), the messages are produced instantly.当为每条消息调用kcat时（这有点矫枉过正），消息会立即生成。

The question remains:问题仍然存在：

How do I tell a Kafka-pipeline to instantly produce my first message?我如何告诉 Kafka 管道立即生成我的第一条消息？

If you have an example in Go, this would also help, since I am having similar observations with a small Go program using kafka-go .如果您在 Go 中有一个示例，这也会有所帮助，因为我对使用kafka-go的小型 Go 程序有类似的观察结果。 I may post a separate question if I can strip that down to a postable format.如果我可以将其分解为可发布的格式，我可能会发布一个单独的问题。

UPDATE : I tried using a bitnami image on a pure Linux host.更新：我尝试在纯 Linux 主机上使用 bitnami 图像。 Producing and consuming via kafkacat works as expected on this system.通过kafkacat生产和消费在此系统上按预期工作。 I will post an answer once I know more.一旦我知道更多，我会发布答案。

Answer 1

Here is how I solved the problem.这是我解决问题的方法。

The issue was not in the Kafka docker images.问题不在 Kafka docker 图像中。 They all work as expected, although I was able to crash the Java-based Kafkas by just firing up kcat against them.它们都按预期工作，尽管我能够通过对它们启动kcat来使基于 Java 的 Kafka 崩溃。 I later added rpk (RedPanda, a non-Java "Kafka"), which was much more stable in my single node setup.我后来添加了rpk （RedPanda，一个非 Java 的“Kafka”），它在我的单节点设置中更加稳定。

Findings发现

Using kcat I did not find any way of producing messages instantly without buffering.使用kcat我没有找到任何无需缓冲即可立即生成消息的方法。 It notoriously ignores all -X args.众所周知，它会忽略所有-X参数。 (edenhill/kcat Version 1.7.0, MacOS, Arm64) （edenhill/kcat 版本 1.7.0，MacOS，Arm64）
Sending single messages works.发送单个消息有效。 When closing the input pipe, kcat will flush the output buffer.当关闭输入 pipe 时， kcat将刷新 output 缓冲区。
Consuming messages instantly via kcat is possible and works by default.通过kcat立即使用kcat是可能的，并且默认情况下可以工作。
Other Kafka clients do not have this issue.其他 Kafka 客户端没有这个问题。 I created a small kafka-go example that just works as expected;我创建了一个按预期工作的小型kafka-go 示例； no extensive buffering by default.默认情况下没有大量缓冲。

Conculsion脑震荡

Do not use kcat to produce messages via long-running pipes.不要使用kcat通过长时间运行的管道生成消息。
Use kafka-go or a similar client event for small health checks and other "scripts".使用kafka-go或类似的客户端事件进行小型健康检查和其他“脚本”。

使用kafkacat向无缓冲的Kafka发送消息

问题描述

1 个解决方案

解决方案1
0 2022-04-01 22:19:05

使用kafkacat向无缓冲的Kafka发送消息

问题描述

1 个解决方案

解决方案1 0 2022-04-01 22:19:05

解决方案1
0 2022-04-01 22:19:05