简体   繁体   English

Kafka Node-如何检索压缩主题中的所有消息

[英]Kafka Node - How to retrieve all messages on a compacted topic

I am attempting to use kafka-node to read compacted messages from a kafka topic. 我正在尝试使用kafka-node来读取来自kafka主题的压缩消息。

The problem is that recently inserted messages are left above the EOL and are not reachable until additional messages are inserted. 问题在于,最近插入的消息保留在EOL上方,只有插入其他消息后才能访问。 Effectively there is a gap between the EOL and the High Water Offset which prevents reading of the latest messages. 实际上,EOL和“高水位偏移”之间存在一定的距离,这会阻止阅读最新消息。 Its not clear why this is. 目前尚不清楚为什么。

A topic has been created with 已使用创建一个主题

kafka-topics.sh --zookeeper ${KAFKA_HOST}:2181 --create --topic atopic --config "cleanup.policy=compact" --config "delete.retention.ms=100" --config "segment.ms=100" --config "min.cleanable.dirty.ratio=0" --partitions 1 --replication-factor 1

A number of key values are produced into the topic. 主题中产生了许多键值。 Some of the keys were the same. 一些键是相同的。

var client = new kafka.KafkaClient({kafkaHost: "<host:port>",autoConnect: true})
var producer = new HighLevelProducer(client);
  producer.send(payload, function(error, result) {
  debug('Sent payload to Kafka: ', payload);
  if (error) {
    console.error(error);
  } else {
   res(true)
  }
  client.close()
 });
});

Here are the keys and values inserted 这是插入的键和值

key - 1
key2 - 1
key3 - 1
key - 2
key2 - 2
key3 - 2
key1 - 3
key - 3
key2 - 3
key3 - 3

Then the set of topic keys was requested. 然后,请求主题键集。

 var options = { id: 'consumer1', kafkaHost: "<host:port>", groupId: "consumergroup1", sessionTimeout: 15000, protocol: ['roundrobin'], fromOffset: 'earliest' }; var consumerGroup = new ConsumerGroup(options, topic); consumerGroup.on('error', onError); consumerGroup.on('message', onMessage); consumerGroup.on('done', function(message) { consumerGroup.close(true,function(){ }); }) function onError (error) { console.error(error); } function onMessage (message) {) console.log('%s read msg Topic="%s" Partition=%s Offset=%d HW=%d', this.client.clientId, message.topic, message.partition, message.offset, message.highWaterOffset, message.value); } }) 
The results are surprising: 结果令人惊讶:

 consumer1 read msg Topic="atopic" Partition=0 Offset=4 highWaterOffset=10 Key=key2 value={"name":"key2","url":"2"} consumer1 read msg Topic="atopic" Partition=0 Offset=5 highWaterOffset=10 Key=key3 value={"name":"key3","url":"2"} consumer1 read msg Topic="atopic" Partition=0 Offset=6 highWaterOffset=10 Key=key1 value={"name":"key1","url":"3"} consumer1 read msg Topic="atopic" Partition=0 Offset=7 highWaterOffset=10 Key=key value={"name":"key","url":"3"} consumer1 read msg Topic="atopic" Partition=0 Offset=0 highWaterOffset=10 Key= value= consumer1 read msg Topic="atopic" Partition=0 Offset=0 highWaterOffset=10 Key= value= consumer1 read msg Topic="atopic" Partition=0 Offset=0 highWaterOffset=10 Key= value= consumer1 read msg Topic="atopic" Partition=0 Offset=0 highWaterOffset=10 Key= value= 

There is a high water offset which represents the latest value of 10. However the offset value the consumer sees is only 7. Somehow the compaction prevents the consumer from seeing the latest messages. 有一个高的水偏移量,它表示最新值10。但是,消费者看到的偏移量值只有7。某种程度上,压实会阻止消费者看到最新消息。

Its not clear how to avoid this constraint and allow the consumer to see the latest messages. 目前尚不清楚如何避免这种限制并允许消费者查看最新消息。

Any suggestions appreciated. 任何建议表示赞赏。 Thanks. 谢谢。

Somehow the compaction prevents the consumer from seeing the latest messages. 压缩以某种方式阻止了消费者看到最新消息。

Yes, you are missing a few messages, but you are also seeing others. 是的,您缺少一些消息,但您也看到了其他消息。

Compaction is removing the earlier keys. 压缩将删除早期的键。

Notice how there are no url - 1 values at all 请注意,没有url - 1根本没有url - 1

Key=key2 value={"name":"key2","url":"2"}
Key=key3 value={"name":"key3","url":"2"}
Key=key1 value={"name":"key1","url":"3"}
Key=key value={"name":"key","url":"3"}

That is because you sent new values for the same key. 这是因为您为同一键发送了新值。

And you sent 10 messages, so the high water offset for the topic is 10 您发送了10条消息,因此该主题的最高水位偏移为10

Your code doesn't necessarily look wrong, but you should have two more 3 values. 您的代码不一定看起来不对,但是您应该再有两个3个值。 The offsets that get printed correspond to this logic. 打印的偏移量与此逻辑相对应。


  
 
  
  
    key - 1 | 0 
  

  
 
  
  
    key2 - 1 | 1 
  

  
 
  
  
    key3 - 1 | 2 
  

  
 
  
  
    key - 2 | 3 
  

  
 
  
  
    key2 - 2 | 4 
  

  
 
  
  
    key3 - 2 | 5 
  
key1 - 3 | 6
key  - 3 | 7
key2 - 3 | 8
key3 - 3 | 9

Generally, I would suggest not having Kafka try to compact the topic and write log segments 10x a second, as well as using different libraries such as node-rdkafka 通常,我建议不要让Kafka尝试压缩主题并每秒写10次日志段,以及使用不同的库,例如node-rdkafka

After working a bit more with kafka it seems that the kafka-node api has the following behaviour (which I think actually derives from kafka itself). 在使用kafka进行更多工作之后,kafka-node api似乎具有以下行为(我认为这实际上源自kafka本身)。

When messages are queried before the highWaterOff then only messages up to the highWaterOffset are returned to the ConsumerGroup. 当在highWaterOff之前查询消息时,只有直到highWaterOffset的消息才返回到ConsumerGroup。 This makes sense if the messages have not been replicated because another consumer in the group would not necessarily see these messages. 如果消息没有被复制,这是有意义的,因为组中的另一个使用者不一定会看到这些消息。

It is still possible to request and receive messages beyond the highWaterOffset using a Consumer rather than a ConsumerGroup and by querying a specific partition. 仍然可以使用Consumer而不是ConsumerGroup并通过查询特定分区来请求和接收超出highWaterOffset的消息。

Also the 'done' event seems to get fired when the offset is not necessarily at the latestOffset. 同样,当偏移量不一定为lastOffset时,也会触发“ done”事件。 In this case it is necessary to submit a further query at message.offset+1. 在这种情况下,有必要在message.offset + 1处提交进一步的查询。 If you continue to do this you can get all messages up to the latestOffset. 如果继续执行此操作,则可以获取所有消息,直到最新的Offset。

It is not clear to my why kafka has this behaviour but there is a probably some lower level detail that surfaces this emergent behaviour. 我不清楚为什么kafka会出现这种行为,但是可能存在一些较低级别的细节来显示这种新出现的行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM