简体   繁体   English

如何删除消费者已经消费的数据? 卡夫卡

[英]How to delete data which already been consumed by consumer? Kafka

I am doing data replication in kafka. 我正在用kafka做数据复制。 But, the size of kafka log file is increases very quickly. 但是,kafka日志文件的大小增长非常快。 The size reaches 5 gb in a day. 大小一天达到5 GB。 As a solution of this problem, ı want to delete processed data immediately. 作为此问题的解决方案,我想立即删除已处理的数据。 I am using delete record method in AdminClient to delete offset. 我在AdminClient中使用删除记录方法来删除偏移量。 But when I look at the log file, data corresponding to that offset is not deleted. 但是当我查看日志文件时,对应于该偏移量的数据不会被删除。

RecordsToDelete recordsToDelete = RedcordsToDelete.beforeOffset(offset);
TopicPartition topicPartition = new TopicPartition(topicName,partition);
Map<TopicPartition,RecordsToDelete> deleteConf = new HashMap<>();
deleteConf.put(topicPartition,recordsToDelete);
adminClient.deleteRecords(deleteConf);

I don't want suggestions like (log.retention.hours , log.retention.bytes , log.segment.bytes , log.cleanup.policy=delete) 我不需要像(log.retention.hours,log.retention.bytes,log.segment.bytes,log.cleanup.policy = delete)这样的建议

Because I just want to delete data consumed by the consumer. 因为我只想删除使用者消耗的数据。 In this solution, I also deleted the data that is not consumed. 在此解决方案中,我还删除了未使用的数据。

What are your suggestions? 您有什么建议?

You didn't do anything wrong. 你没做错什么 The code you provided works and I've tested it. 您提供的代码有效,我已经对其进行了测试。 Just in case I've overlooked something in your code, mine is: 以防万一我忽略了您的代码中的某些内容,我的是:

public void deleteMessages(String topicName, int partitionIndex, int beforeIndex) {
    TopicPartition topicPartition = new TopicPartition(topicName, partitionIndex);
    Map<TopicPartition, RecordsToDelete> deleteMap = new HashMap<>();
    deleteMap.put(topicPartition, RecordsToDelete.beforeOffset(beforeIndex));
    kafkaAdminClient.deleteRecords(deleteMap);
}

I've used group: 'org.apache.kafka', name: 'kafka-clients', version: '2.0.0' 我使用过的组:“ org.apache.kafka”,名称:“ kafka-clients”,版本:“ 2.0.0”

So check if you are targeting right partition ( 0 for the first one) 因此,请检查您是否定位到正确的分区(第一个分区为0)

Check your broker version: https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html says: 检查您的代理版本: https : //kafka.apache.org/20/javadoc/index.html? org/apache/kafka/clients/admin/ AdminClient.html说:

This operation is supported by brokers with version 0.11.0.0 版本0.11.0.0的代理支持此操作

Produce the messages from the same application, to be sure you're connected properly. 产生来自同一应用程序的消息,以确保您已正确连接。

There is one more option you can consider. 您可以考虑另外一种选择。 Using cleanup.policy=compact If your message keys are repeating you could benefit from it. 使用cleanup.policy = compact如果您的消息密钥重复,则可以从中受益。 Not just because older messages for that key will be automatically deleted but you can use the fact that message with null payload deletes all the messages for that key. 不仅因为该密钥的旧消息将被自动删除,而且您可以使用以下事实:有效载荷为空的消息会删除该密钥的所有消息。 Just don't forget to set delete.retention.ms and min.compaction.lag.ms to values small enough. 只是不要忘记将delete.retention.msmin.compaction.lag.ms设置为足够小的值。 In that case you can consume a message and than produce null payload for the same key ( but be cautious with this approach since this way you can delete messages ( with that key) you didn't consume) 在这种情况下,您可以使用一条消息,而不是为同一密钥生成空有效负载(但是请谨慎使用此方法,因为这样可以删除未使用的消息(使用该密钥))

Try this 尝试这个

DeleteRecordsResult result = adminClient.deleteRecords(recordsToDelete);
Map<TopicPartition, KafkaFuture<DeletedRecords>> lowWatermarks = result.lowWatermarks();
try {
    for (Map.Entry<TopicPartition, KafkaFuture<DeletedRecords>> entry : lowWatermarks.entrySet()) {
        System.out.println(entry.getKey().topic() + " " + entry.getKey().partition() + " " + entry.getValue().get().lowWatermark());
    }
} catch (InterruptedException | ExecutionException e) {
    e.printStackTrace();
}
adminClient.close();

In this code, you need to call entry.getValue().get().lowWatermark() , because adminClient.deleteRecords(recordsToDelete) returns a map of Futures, you need to wait for the Future to run by calling get() 在此代码中,您需要调用entry.getValue().get().lowWatermark() ,因为adminClient.deleteRecords(recordsToDelete)返回Future的地图,您需要等待通过调用get()运行Future

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 卡夫卡消费者收到一条消息之前 - kafka consumer get a message which was consumed before 如何增加每批Spring Kafka Consumer消耗的消息数? - How to increase the number of messages consumed by Spring Kafka Consumer in each batch? 单个 kafka 消费者 - 从多个主题中读取 - 消费消息的顺序是什么 - Single kafka consumer - reading from mutliple topics - what is the order in which messages will be consumed 如何删除 kafka 消费者组(通过新的消费者 api 创建)? - How to delete kafka consumer group (created via new consumer api)? Kafka Consumer仅在产生“足够”的数据后读取 - Kafka Consumer only reads after 'enough' data has been produced 如何找到哪个消费者分配给kafka中主题的哪个分区? - How to find which consumer is assigned to which partition of a topic in kafka? 如何在Apache Camel中使事件驱动的使用者删除消耗的消息? - How to make the event-driven consumer in Apache Camel delete the consumed messages? 如何知道kafka队列消耗了多少数据以及kafka队列主题中存在哪些数据? - How to know howmany data consumed from kafka queue and what are data existing inside kafka queue topic? 如何从Java代码中删除Kafka主题的使用者组? - How to delete a consumer group of a Kafka Topic from java code? 你如何断言如果一个异常已经被消耗/处理/捕获,就抛出该异常 - How do you assert that an exception is thrown if it has already been consumed/handled/caught
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM