简体   繁体   English

使用kafkacat向无缓冲的Kafka发送消息

[英]Sending messages to Kafka unbuffered using kafkacat

I have single node Kafka instance running locally via docker-compose.我有一个通过 docker-compose 在本地运行的单节点 Kafka 实例。
(system: Mac/Arm64, image: wurstmeister/kafka:2.13-2.6.0) (系统:Mac/Arm64,图像:wurstmeister/kafka:2.13-2.6.0)

I want to use kafkacat ( kcat installed via Homebrew) to instantly produce and consume messages to and from Kafka.我想使用kafkacat (通过 Homebrew 安装的kcat即时生成和使用来自 Kafka 的消息。

Here is a minimal script:这是一个最小的脚本:

#!/usr/bin/env bash

NUM_MESSAGES=${1:-3}  # use arg1 or use default=3
KCAT_ARGS="-q -u -c $NUM_MESSAGES -b localhost:9092 -t unbuffered"

log() { echo "$*" 1>&2; }

producer() {
    log "starting producer"
    for i in `seq 1 3`; do
        echo "msg $i"
        log "produced: msg $i"
        sleep 1
    done | kcat $KCAT_ARGS -P
}

consumer() {
    log "starting consumer"
    kcat $KCAT_ARGS -C -o end | while read line; do
        log "consumed: $line"
    done
}

producer&
consumer&
wait

I would expect (roughly) the following output:我希望(大致)以下 output:

starting producer
starting consumer
produced: msg 1
consumed: msg 1
produced: msg 2
consumed: msg 2
produced: msg 3
consumed: msg 3

However, I only get output with produced and consumed messages fully batched into two groups, even though both the consumer and producer are running in parallel:但是,我只得到 output,其中生产消息和消费消息完全分为两组,即使consumerproducer都在并行运行:

starting producer
starting consumer
produced: msg 1
produced: msg 2
produced: msg 3
consumed: msg 1
consumed: msg 2
consumed: msg 3

Here are some kafkacat/kafka producer properties and the values I already tried to change the producer behavior.这里有一些 kafkacat/kafka 生产者属性和我已经尝试改变生产者行为的值。

# kcat options having no effect on the test case
-u  # unbuffered output
-T  # act like `tee` and echo input

# kafka properties having no effect on the test case
-X queue.buffering.max.messages=1
-X queue.buffering.max.kbytes=1
-X batch.num.messages=1
-X queue.buffering.max.ms=100
-X socket.timeout.ms=100
-X max.in.flight.requests.per.connection=1
-X auto.commit.interval.ms=100
-X request.timeout.ms=100
-X message.timeout.ms=100
-X offset.store.sync.interval.ms=1
-X message.copy.max.bytes=100
-X socket.send.buffer.bytes=100
-X linger.ms=1
-X delivery.timeout.ms=100

None of the options above had any effect on the pipeline.上述选项均未对管道产生任何影响。

What am I missing?我错过了什么?

Edit : It seems to be a flushing issue with either kcat or librdkafka.编辑:这似乎是 kcat 或 librdkafka 的冲洗问题。 Maybe the -X properties are not used correctly.也许-X属性没有正确使用。

Here are the current observations (will edit them as I learn more):以下是当前的观察结果(当我了解更多信息时将对其进行编辑):

  • When sending a larger payload of 10000 messages with a smaller delay in the script, kcat will produce several batches of messages.当在脚本中以较小的延迟发送 10000 条消息的较大负载时, kcat将生成几批消息。 It seems to be size-based, but not configurable by any of the -X options.它似乎是基于大小的,但不能通过任何-X选项进行配置。

  • The batches are then also correctly picked up by the consumer.然后,消费者也可以正确地提取这些批次。 So it must be a producer issue .所以这一定是生产者的问题

  • I also tried the script in docker with the current kafkacat from the apline repos.我还使用 apline 回购中的当前kafkacat尝试了 docker 中的脚本。 This one seems to flush a but earlier;这个好像冲的比较早; with less data needed to fill the "hidden" buffer.填充“隐藏”缓冲区所需的数据更少。 The -X options also had no effect. -X选项也没有效果。

  • Also the -X properties seem to be checked.似乎还检查了-X属性。 If I set out-of-range values, kcat (or maybe librdkafka) will complain.如果我设置了超出范围的值,kcat(或者 librdkafka)会抱怨。 However, setting low values for any of the timeout and buffer size values has no effect.但是,为任何超时和缓冲区大小值设置较低的值都没有效果。

  • When calling kcat for every message (which is a bit of an overkill), the messages are produced instantly.当为每条消息调用kcat时(这有点矫枉过正),消息会立即生成。

The question remains:问题仍然存在:

How do I tell a Kafka-pipeline to instantly produce my first message?我如何告诉 Kafka 管道立即生成我的第一条消息?

If you have an example in Go, this would also help, since I am having similar observations with a small Go program using kafka-go .如果您在 Go 中有一个示例,这也会有所帮助,因为我对使用kafka-go的小型 Go 程序有类似的观察结果。 I may post a separate question if I can strip that down to a postable format.如果我可以将其分解为可发布的格式,我可能会发布一个单独的问题。

UPDATE : I tried using a bitnami image on a pure Linux host.更新:我尝试在纯 Linux 主机上使用 bitnami 图像。 Producing and consuming via kafkacat works as expected on this system.通过kafkacat生产和消费在此系统上按预期工作。 I will post an answer once I know more.一旦我知道更多,我会发布答案。

Here is how I solved the problem.这是我解决问题的方法。

The issue was not in the Kafka docker images.问题不在 Kafka docker 图像中。 They all work as expected, although I was able to crash the Java-based Kafkas by just firing up kcat against them.它们都按预期工作,尽管我能够通过对它们启动kcat来使基于 Java 的 Kafka 崩溃。 I later added rpk (RedPanda, a non-Java "Kafka"), which was much more stable in my single node setup.我后来添加了rpk (RedPanda,一个非 Java 的“Kafka”),它在我的单节点设置中更加稳定。

Findings发现

  • Using kcat I did not find any way of producing messages instantly without buffering.使用kcat我没有找到任何无需缓冲即可立即生成消息的方法。 It notoriously ignores all -X args.众所周知,它会忽略所有-X参数。 (edenhill/kcat Version 1.7.0, MacOS, Arm64) (edenhill/kcat 版本 1.7.0,MacOS,Arm64)
  • Sending single messages works.发送单个消息有效。 When closing the input pipe, kcat will flush the output buffer.当关闭输入 pipe 时, kcat将刷新 output 缓冲区。
  • Consuming messages instantly via kcat is possible and works by default.通过kcat立即使用kcat是可能的,并且默认情况下可以工作。
  • Other Kafka clients do not have this issue.其他 Kafka 客户端没有这个问题。 I created a small kafka-go example that just works as expected;我创建了一个按预期工作的小型kafka-go 示例 no extensive buffering by default.默认情况下没有大量缓冲。

Conculsion脑震荡

  • Do not use kcat to produce messages via long-running pipes.不要使用kcat通过长时间运行的管道生成消息。
  • Use kafka-go or a similar client event for small health checks and other "scripts".使用kafka-go或类似的客户端事件进行小型健康检查和其他“脚本”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM