[英]kafka flink timestamp Event time and watermark
I am reading the book Stream Processing with Apache Flink and it is stated that “As of version 0.10.0, Kafka supports message timestamps.我正在阅读《使用 Apache Flink 进行流处理》一书,其中指出“从 0.10.0 版开始,Kafka 支持消息时间戳。 When reading from Kafka version 0.10 or later, the consumer will automatically extract the message timestamp as an event-time timestamp if the application runs in event-time mode*” So inside a
processElement
function the call context.timestamp()
will by default return the kafka message timestamp?从 Kafka 0.10 或更高版本读取时,如果应用程序以事件时间模式运行*,消费者将自动提取消息时间戳作为事件时间时间戳*”因此在
processElement
函数中,调用context.timestamp()
将默认返回kafka 消息时间戳? Coul you please provide a simple example on how to implement AssignerWithPeriodicWatermarks/AssignerWithPunctuatedWatermarks that extract (and builds watermarks) based on the consumed kafka message timestamp.您能否提供一个简单的示例,说明如何实现根据使用的 kafka 消息时间戳提取(并构建水印)的 AssignerWithPeriodicWatermarks/AssignerWithPunctuatedWatermarks。
If I am using TimeCharacteristic.ProcessingTime
, would ctx.timestamp() return the processing time and in such case would it be similar to context.timerService().currentProcessingTime()
.如果我使用
TimeCharacteristic.ProcessingTime
, ctx.timestamp() 会返回处理时间,在这种情况下它会类似于context.timerService().currentProcessingTime()
。
Thank you.谢谢你。
The Flink Kafka consumer takes care of this for you, and puts the timestamp where it needs to be. Flink Kafka 消费者会为你处理这个问题,并将时间戳放在需要的地方。 In Flink 1.11 you can simply rely on this, though you still need to take care of providing a WatermarkStrategy that specifies the out-of-orderness (or asserts that the timestamps are in order):
在 Flink 1.11 中,您可以简单地依赖它,但您仍然需要注意提供一个 WatermarkStrategy 来指定乱序(或断言时间戳是有序的):
FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>(...);
myConsumer.assignTimestampsAndWatermarks(
WatermarkStrategy.
.forBoundedOutOfOrderness(Duration.ofSeconds(20)));
In earlier versions of Flink you had to provide an implementation of a timestamp assigner, which would look like this:在早期版本的 Flink 中,您必须提供时间戳分配器的实现,如下所示:
public long extractTimestamp(Long element, long previousElementTimestamp) {
return previousElementTimestamp;
}
This version of the extractTimestamp
method is passed the current value of the timestamp present in the StreamRecord as previousElementTimestamp
, which in this case will be the timestamp put there by the Flink Kafka consumer.此版本的
extractTimestamp
方法将 StreamRecord 中存在的时间戳的当前值作为previousElementTimestamp
传递,在这种情况下,它将是 Flink Kafka 消费者放置在那里的时间戳。
Flink 1.11 docs Flink 1.11 文档
Flink 1.10 docs Flink 1.10 文档
As for what is returned by ctx.timestamp()
when using TimeCharacteristic.ProcessingTime
, this method returns NULL in that case.至于
ctx.timestamp()
在使用TimeCharacteristic.ProcessingTime
时返回的内容,在这种情况下此方法返回 NULL。 (Semantically, yes, it is as though the timestamp is the current processing time, but that's not how it's implemented.) (从语义上讲,是的,就好像时间戳是当前处理时间一样,但这不是它的实现方式。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.