简体   繁体   English

Java,如何在 apache kafka 中获取主题中的消息数

[英]Java, How to get number of messages in a topic in apache kafka

I am using apache kafka for messaging.我正在使用 apache kafka 进行消息传递。 I have implemented the producer and consumer in Java.我已经用 Java 实现了生产者和消费者。 How can we get the number of messages in a topic?我们如何获取主题中的消息数?

It is not java, but may be useful它不是java,但可能有用

./bin/kafka-run-class.sh kafka.tools.GetOffsetShell \
  --broker-list <broker>:<port> \
  --topic <topic-name> \
  | awk -F  ":" '{sum += $3} END {print sum}'

The only way that comes to mind for this from a consumer point of view is to actually consume the messages and count them then.从消费者的角度来看,对此想到的唯一方法是实际使用消息并随后对其进行计数。

The Kafka broker exposes JMX counters for number of messages received since start-up but you cannot know how many of them have been purged already. Kafka 代理公开了自启动以来收到的消息数量的 JMX 计数器,但您无法知道其中有多少已被清除。

In most common scenarios, messages in Kafka is best seen as an infinite stream and getting a discrete value of how many that is currently being kept on disk is not relevant.在最常见的情况下,Kafka 中的消息最好被视为无限流,并且获取当前保留在磁盘上的数量的离散值是不相关的。 Furthermore things get more complicated when dealing with a cluster of brokers which all have a subset of the messages in a topic.此外,当处理在一个主题中都有一个消息子集的代理集群时,事情会变得更加复杂。

Since ConsumerOffsetChecker is no longer supported, you can use this command to check all messages in topic:由于不再支持ConsumerOffsetChecker ,您可以使用此命令检查主题中的所有消息:

bin/kafka-run-class.sh kafka.admin.ConsumerGroupCommand \
    --group my-group \
    --bootstrap-server localhost:9092 \
    --describe

Where LAG is the count of messages in topic partition:其中LAG是主题分区中的消息数:

在此处输入图像描述

Also you can try to use kafkacat .您也可以尝试使用kafkacat This is an open source project that may help you to read messages from a topic and partition and prints them to stdout.这是一个开源项目,可以帮助您从主题和分区中读取消息并将它们打印到标准输出。 Here is a sample that reads the last 10 messages from sample-kafka-topic topic, then exit:这是一个示例,它从sample-kafka-topic主题中读取最后 10 条消息,然后退出:

kafkacat -b localhost:9092 -t sample-kafka-topic -p 0 -o -10 -e

I actually use this for benchmarking my POC.我实际上用它来对我的 POC 进行基准测试。 The item you want to use ConsumerOffsetChecker.您要使用 ConsumerOffsetChecker 的项目。 You can run it using bash script like below.您可以使用如下所示的 bash 脚本运行它。

bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker  --topic test --zookeeper localhost:2181 --group testgroup

And below is the result :结果如下: 在此处输入图像描述 As you can see on the red box, 999 is the number of message currently in the topic.正如您在红色框中看到的那样,999 是当前主题中的消息数。

Update: ConsumerOffsetChecker is deprecated since 0.10.0, you may want to start using ConsumerGroupCommand.更新:ConsumerOffsetChecker 自 0.10.0 起已弃用,您可能希望开始使用 ConsumerGroupCommand。

Sometimes the interest is in knowing the number of messages in each partition, for example, when testing a custom partitioner.The ensuing steps have been tested to work with Kafka 0.10.2.1-2 from Confluent 3.2.有时感兴趣的是了解每个分区中的消息数量,例如,在测试自定义分区器时。随后的步骤已经过测试,可与 Confluent 3.2 中的 Kafka 0.10.2.1-2 一起使用。 Given a Kafka topic, kt and the following command-line:给定一个 Kafka 主题、 kt和以下命令行:

$ kafka-run-class kafka.tools.GetOffsetShell \
  --broker-list host01:9092,host02:9092,host02:9092 --topic kt

That prints the sample output showing the count of messages in the three partitions:打印示例输出,显示三个分区中的消息计数:

kt:2:6138
kt:1:6123
kt:0:6137

The number of lines could be more or less depending on the number of partitions for the topic.行数可能或多或少取决于主题的分区数。

Use https://prestodb.io/docs/current/connector/kafka-tutorial.html使用https://prestodb.io/docs/current/connector/kafka-tutorial.html

A super SQL engine, provided by Facebook, that connects on several data sources (Cassandra, Kafka, JMX, Redis ...). Facebook 提供的一个超级 SQL 引擎,它连接多个数据源(Cassandra、Kafka、JMX、Redis ...)。

PrestoDB is running as a server with optional workers (there is a standalone mode without extra workers), then you use a small executable JAR (called presto CLI) to make queries. PrestoDB 作为带有可选工作程序的服务器运行(有一个没有额外工作程序的独立模式),然后您使用一个小的可执行 JAR(称为 presto CLI)进行查询。

Once you have configured well the Presto server , you can use traditionnal SQL:配置好 Presto 服务器后,就可以使用传统的 SQL:

SELECT count(*) FROM TOPIC_NAME;

Apache Kafka command to get un handled messages on all partitions of a topic: Apache Kafka 命令在主题的所有分区上获取未处理的消息:

kafka-run-class kafka.tools.ConsumerOffsetChecker 
    --topic test --zookeeper localhost:2181 
    --group test_group

Prints:印刷:

Group      Topic        Pid Offset          logSize         Lag             Owner
test_group test         0   11051           11053           2               none
test_group test         1   10810           10812           2               none
test_group test         2   11027           11028           1               none

Column 6 is the un-handled messages.第 6 列是未处理的消息。 Add them up like this:像这样添加它们:

kafka-run-class kafka.tools.ConsumerOffsetChecker 
    --topic test --zookeeper localhost:2181 
    --group test_group 2>/dev/null | awk 'NR>1 {sum += $6} 
    END {print sum}'

awk reads the rows, skips the header line and adds up the 6th column and at the end prints the sum. awk 读取行,跳过标题行并将第 6 列相加,最后打印总和。

Prints印刷

5

Using the Java client of Kafka 2.11-1.0.0, you can do the following thing :使用 Kafka 2.11-1.0.0 的 Java 客户端,您可以执行以下操作:

    KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
    consumer.subscribe(Collections.singletonList("test"));
    while(true) {
        ConsumerRecords<String, String> records = consumer.poll(100);
        for (ConsumerRecord<String, String> record : records) {
            System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());

            // after each message, query the number of messages of the topic
            Set<TopicPartition> partitions = consumer.assignment();
            Map<TopicPartition, Long> offsets = consumer.endOffsets(partitions);
            for(TopicPartition partition : offsets.keySet()) {
                System.out.printf("partition %s is at %d\n", partition.topic(), offsets.get(partition));
            }
        }
    }

Output is something like this :输出是这样的:

offset = 10, key = null, value = un
partition test is at 13
offset = 11, key = null, value = deux
partition test is at 13
offset = 12, key = null, value = trois
partition test is at 13

Run the following (assuming kafka-console-consumer.sh is on the path):运行以下命令(假设kafka-console-consumer.sh在路径上):

kafka-console-consumer.sh  --from-beginning \
--bootstrap-server yourbroker:9092 --property print.key=true  \
--property print.value=false --property print.partition \
--topic yourtopic --timeout-ms 5000 | tail -n 10|grep "Processed a total of"

To get all the messages stored for the topic you can seek the consumer to the beginning and end of the stream for each partition and sum the results要获取为主题存储的所有消息,您可以将消费者寻找到每个分区的流的开头和结尾,并对结果求和

List<TopicPartition> partitions = consumer.partitionsFor(topic).stream()
        .map(p -> new TopicPartition(topic, p.partition()))
        .collect(Collectors.toList());
    consumer.assign(partitions); 
    consumer.seekToEnd(Collections.emptySet());
Map<TopicPartition, Long> endPartitions = partitions.stream()
        .collect(Collectors.toMap(Function.identity(), consumer::position));
    consumer.seekToBeginning(Collections.emptySet());
System.out.println(partitions.stream().mapToLong(p -> endPartitions.get(p) - consumer.position(p)).sum());

I had this same question and this is how I am doing it, from a KafkaConsumer, in Kotlin:我有同样的问题,这就是我在 Kotlin 中来自 KafkaConsumer 的做法:

val messageCount = consumer.listTopics().entries.filter { it.key == topicName }
    .map {
        it.value.map { topicInfo -> TopicPartition(topicInfo.topic(), topicInfo.partition()) }
    }.map { consumer.endOffsets(it).values.sum() - consumer.beginningOffsets(it).values.sum()}
    .first()

Very rough code, as I just got this to work, but basically you want to subtract the topic's beginning offset from the ending offset and this will be the current message count for the topic.非常粗略的代码,因为我刚刚开始工作,但基本上你想从结束偏移中减去主题的开始偏移,这将是主题的当前消息计数。

You can't just rely on the end offset because of other configurations (cleanup policy, retention-ms, etc.) that may end up causing the deletion old messages from your topic.您不能仅仅依赖结束偏移量,因为其他配置(清理策略、保留毫秒等)可能最终导致从您的主题中删除旧消息。 Offsets only "move" forward, so it is the beggining offset that will move forward closer to the end offset (or eventually to the same value, if the topic contains no message right now).偏移量仅向前“移动”,因此它是开始偏移量将向前移动更接近结束偏移量(或者最终到相同的值,如果主题现在不包含消息)。

Basically the end offset represents the overall number of messages that went through that topic, and the difference between the two represent the number of messages that the topic contains right now.基本上,结束偏移量表示通过该主题的消息总数,两者之间的差异表示该主题现在包含的消息数。

In most recent versions of Kafka Manager, there is a column titled Summed Recent Offsets .在最新版本的 Kafka Manager 中,有一列标题为Summed Recent Offsets

在此处输入图像描述

Excerpts from Kafka docs Kafka 文档节选

Deprecations in 0.9.0.0 0.9.0.0 中的弃用

The kafka-consumer-offset-checker.sh (kafka.tools.ConsumerOffsetChecker) has been deprecated. kafka-consumer-offset-checker.sh (kafka.tools.ConsumerOffsetChecker) 已被弃用。 Going forward, please use kafka-consumer-groups.sh (kafka.admin.ConsumerGroupCommand) for this functionality.今后,请使用 kafka-consumer-groups.sh (kafka.admin.ConsumerGroupCommand) 来实现此功能。

I am running Kafka broker with SSL enabled for both server and client.我正在为服务器和客户端运行启用 SSL 的 Kafka 代理。 Below command I use下面的命令我使用

kafka-consumer-groups.sh --bootstrap-server Broker_IP:Port --list --command-config /tmp/ssl_config kafka-consumer-groups.sh --bootstrap-server Broker_IP:Port --command-config /tmp/ssl_config --describe --group group_name_x

where /tmp/ssl_config is as below其中 /tmp/ssl_config 如下

security.protocol=SSL
ssl.truststore.location=truststore_file_path.jks
ssl.truststore.password=truststore_password
ssl.keystore.location=keystore_file_path.jks
ssl.keystore.password=keystore_password
ssl.key.password=key_password

If you have access to server's JMX interface, the start & end offsets are present at:如果您有权访问服务器的 JMX 接口,则开始和结束偏移量位于:

kafka.log:type=Log,name=LogStartOffset,topic=TOPICNAME,partition=PARTITIONNUMBER
kafka.log:type=Log,name=LogEndOffset,topic=TOPICNAME,partition=PARTITIONNUMBER

(you need to replace TOPICNAME & PARTITIONNUMBER ). (您需要替换TOPICNAMEPARTITIONNUMBER )。 Bear in mind you need to check for each of the replicas of given partition, or you need to find out which one of the brokers is the leader for a given partition (and this can change over time).请记住,您需要检查给定分区的每个副本,或者您需要找出哪个代理是给定分区的领导者(这可能会随着时间而改变)。

Alternatively, you can use Kafka Consumer methods beginningOffsets and endOffsets .或者,您可以使用Kafka Consumer方法beginningOffsetsendOffsets

We can use below simple java to get message count on topic 我们可以使用下面的简单java来获取有关主题的消息数

Properties props = new Properties();
props.setProperty("bootstrap.servers", "localhost:9091");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
List<PartitionInfo> parts = consumer.partitionsFor("topic");
List<TopicPartition> partitions= new ArrayList<>();
for (PartitionInfo p : parts) {
            partitions.add(new TopicPartition(topic, p.partition()));
        }
consumer.assign(partitions);        

Map<TopicPartition, Long> endOffsets = consumer.endOffsets(assignment);
Map<TopicPartition, Long> beginningOffsets = consumer.beginningOffsets(assignment);
long totalMessaheCnt=0;
for (TopicPartition tp : offsets.keySet()) {
totalMessaheCnt += endOffsets.get(tp)-beginningOffsets.get(tp)
}

If you need to calculate the result for all the consumers in a consumer group, (or for different consumer groups), another option is to use the admin client and subtract the consumer group offsets from the topic/partition offsets, code examples in Kotlin:如果您需要为一个消费者组中的所有消费者(或不同消费者组)计算结果,另一种选择是使用管理客户端并从主题/分区偏移量中减去消费者组偏移量,Kotlin 中的代码示例:

val topicName = "someTopic"
val groupId = "theGroupId"
val admin = Admin.create(kafkaProps.buildAdminProperties()) // Spring KafkaProperties
val parts = admin.describeTopics(listOf(topicName)).values()[topicName]!!.get().partitions()
val topicPartitionOffsets = admin.listOffsets(parts.associate { TopicPartition(topicName, it.partition()) to OffsetSpec.latest() }).all().get()
val consumerGroupOffsets = admin.listConsumerGroupOffsets(groupId)
    .partitionsToOffsetAndMetadata().get()
val highWaterMark = topicPartitionOffsets.map { it.value.offset() }.sum()
val consumerPos = consumerGroupOffsets.map { it.value.offset() }.sum()
val unProcessedMessages = highWaterMark - consumerPos

Also here is a working version of LeYAUable's example code which only uses a regular (non-admin) client:此外,这里是 LeYAUable 示例代码的工作版本,它仅使用常规(非管理员)客户端:

val partitions = consumer.partitionsFor("topicName")
        .map { TopicPartition(it.topic(), it.partition()) }
val highWaterMark = consumer.endOffsets(partitions).values.sum()
val consumerPosition = consumer.beginningOffsets(partitions).values.sum()
val msgCount = highWaterMark - consumerPosition

This will only give you the offset for this particular consumer though!不过,这只会为您提供此特定消费者的偏移量! The usual caveat applies that this is imprecise when a topic is compacted.通常需要注意的是,在压缩主题时这是不精确的。

I haven't tried this myself, but it seems to make sense.我自己没有尝试过, 但这似乎是有道理的。

You can also use kafka.tools.ConsumerOffsetChecker ( source ).您还可以使用kafka.tools.ConsumerOffsetChecker ( source )。

The simplest way I've found is to use the Kafdrop REST API /topic/topicName and specify the key: "Accept" / value: "application/json" header in order to get back a JSON response.我发现最简单的方法是使用 Kafdrop REST API /topic/topicName并指定 key: "Accept" / value: "application/json"标头以获取 JSON 响应。

This is documented here .这在此处记录

You may use kafkatool .您可以使用kafkatool Please check this link -> http://www.kafkatool.com/download.html请检查此链接-> http://www.kafkatool.com/download.html

Kafka Tool is a GUI application for managing and using Apache Kafka clusters. Kafka Tool 是一个用于管理和使用 Apache Kafka 集群的 GUI 应用程序。 It provides an intuitive UI that allows one to quickly view objects within a Kafka cluster as well as the messages stored in the topics of the cluster.它提供了一个直观的 UI,允许人们快速查看 Kafka 集群中的对象以及存储在集群主题中的消息。 在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM