[英]How to delete data which already been consumed by consumer? Kafka
I am doing data replication in kafka. 我正在用kafka做数据复制。 But, the size of kafka log file is increases very quickly.
但是,kafka日志文件的大小增长非常快。 The size reaches 5 gb in a day.
大小一天达到5 GB。 As a solution of this problem, ı want to delete processed data immediately.
作为此问题的解决方案,我想立即删除已处理的数据。 I am using delete record method in AdminClient to delete offset.
我在AdminClient中使用删除记录方法来删除偏移量。 But when I look at the log file, data corresponding to that offset is not deleted.
但是当我查看日志文件时,对应于该偏移量的数据不会被删除。
RecordsToDelete recordsToDelete = RedcordsToDelete.beforeOffset(offset);
TopicPartition topicPartition = new TopicPartition(topicName,partition);
Map<TopicPartition,RecordsToDelete> deleteConf = new HashMap<>();
deleteConf.put(topicPartition,recordsToDelete);
adminClient.deleteRecords(deleteConf);
I don't want suggestions like (log.retention.hours , log.retention.bytes , log.segment.bytes , log.cleanup.policy=delete) 我不需要像(log.retention.hours,log.retention.bytes,log.segment.bytes,log.cleanup.policy = delete)这样的建议
Because I just want to delete data consumed by the consumer. 因为我只想删除使用者消耗的数据。 In this solution, I also deleted the data that is not consumed.
在此解决方案中,我还删除了未使用的数据。
What are your suggestions? 您有什么建议?
You didn't do anything wrong. 你没做错什么 The code you provided works and I've tested it.
您提供的代码有效,我已经对其进行了测试。 Just in case I've overlooked something in your code, mine is:
以防万一我忽略了您的代码中的某些内容,我的是:
public void deleteMessages(String topicName, int partitionIndex, int beforeIndex) {
TopicPartition topicPartition = new TopicPartition(topicName, partitionIndex);
Map<TopicPartition, RecordsToDelete> deleteMap = new HashMap<>();
deleteMap.put(topicPartition, RecordsToDelete.beforeOffset(beforeIndex));
kafkaAdminClient.deleteRecords(deleteMap);
}
I've used group: 'org.apache.kafka', name: 'kafka-clients', version: '2.0.0' 我使用过的组:“ org.apache.kafka”,名称:“ kafka-clients”,版本:“ 2.0.0”
So check if you are targeting right partition ( 0 for the first one) 因此,请检查您是否定位到正确的分区(第一个分区为0)
Check your broker version: https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html says: 检查您的代理版本: https : //kafka.apache.org/20/javadoc/index.html? org/apache/kafka/clients/admin/ AdminClient.html说:
This operation is supported by brokers with version 0.11.0.0
版本0.11.0.0的代理支持此操作
Produce the messages from the same application, to be sure you're connected properly. 产生来自同一应用程序的消息,以确保您已正确连接。
There is one more option you can consider. 您可以考虑另外一种选择。 Using cleanup.policy=compact If your message keys are repeating you could benefit from it.
使用cleanup.policy = compact如果您的消息密钥重复,则可以从中受益。 Not just because older messages for that key will be automatically deleted but you can use the fact that message with null payload deletes all the messages for that key.
不仅因为该密钥的旧消息将被自动删除,而且您可以使用以下事实:有效载荷为空的消息会删除该密钥的所有消息。 Just don't forget to set delete.retention.ms and min.compaction.lag.ms to values small enough.
只是不要忘记将delete.retention.ms和min.compaction.lag.ms设置为足够小的值。 In that case you can consume a message and than produce null payload for the same key ( but be cautious with this approach since this way you can delete messages ( with that key) you didn't consume)
在这种情况下,您可以使用一条消息,而不是为同一密钥生成空有效负载(但是请谨慎使用此方法,因为这样可以删除未使用的消息(使用该密钥))
Try this 尝试这个
DeleteRecordsResult result = adminClient.deleteRecords(recordsToDelete);
Map<TopicPartition, KafkaFuture<DeletedRecords>> lowWatermarks = result.lowWatermarks();
try {
for (Map.Entry<TopicPartition, KafkaFuture<DeletedRecords>> entry : lowWatermarks.entrySet()) {
System.out.println(entry.getKey().topic() + " " + entry.getKey().partition() + " " + entry.getValue().get().lowWatermark());
}
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
adminClient.close();
In this code, you need to call entry.getValue().get().lowWatermark()
, because adminClient.deleteRecords(recordsToDelete) returns a map of Futures, you need to wait for the Future to run by calling get() 在此代码中,您需要调用
entry.getValue().get().lowWatermark()
,因为adminClient.deleteRecords(recordsToDelete)返回Future的地图,您需要等待通过调用get()运行Future
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.