简体   繁体   English

当使用侧输出 DataStream 时,Flink 如何处理延迟事件?

[英]How Flink deals with late events when as side-output DataStream is used?

It looks to me that Flink handles late events in 3 ways:在我看来,Flink 以 3 种方式处理延迟事件:

  1. Dropping Late Events when the window expired (default).当 window 过期时丢弃延迟事件(默认)。
  2. Updating the window by including late events with the "allowed lateness" mechanism.通过包含具有“允许延迟”机制的延迟事件来更新 window。
  3. Redirecting late events into another DataStream using the "side output" mechanism.使用“side output”机制将延迟事件重定向到另一个 DataStream。

Let's assume that I have an Event-Time Job that consumes data from Kafka and process a window every 5 minutes.假设我有一个事件时间作业,它使用来自 Kafka 的数据并每 5 分钟处理一次 window。 Now, suppose that I redirect late events into another DataStream.现在,假设我将迟到的事件重定向到另一个 DataStream。

  • Is this new DataStream independent?这个新的 DataStream 是独立的吗?
  • Is it possible to assign a window to this stream in order to process these late events, let's assume, every hour?是否可以将 window 分配给此 stream 以处理这些迟到的事件,假设每小时?
  • If that is possible, is the memory freed after the firing of this window?如果可能的话,memory 在 window 发射后是否被释放?

Thank you all!谢谢你们!

The stream of late events coming from the window operator is an independent stream that contains only events that were so late that the allowed lateness wasn't enough to accommodate them.来自 window 运算符的迟到事件的 stream 是一个独立的 stream,它仅包含迟到以至于允许的迟到不足以容纳它们的事件。

You can do whatever you want with this stream, including sending it through another window.你可以用这个 stream 做任何你想做的事情,包括通过另一个 window 发送它。 But these events will still be late, so you'll need to either re-generate watermarks using a more relaxed strategy that keeps them from still being late, or extend the allowed lateness of this new window.但是这些事件仍然会迟到,因此您需要使用更宽松的策略重新生成水印以防止它们仍然迟到,或者延长这个新的 window 的允许迟到。

Windows always clean up after themselves. Windows 总是自己清理之后。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Flink:如何计算DataStream中的事件总数 - Apache Flink: How to count the total number of events in a DataStream Apache Flink:为 DataStream 添加侧输入 API - Apache Flink : Add side inputs for DataStream API Flink DataStream 如何将一个自定义的 POJO 组合成另一个 DataStream - Flink How do DataStream combine a custom POJO into another DataStream 使用 DataStream API 进行批处理的 Flink Consumer - 我们如何知道何时停止以及如何停止处理 [2 折] - Flink Consumer with DataStream API for Batch Processing - How do we know when to stop & How to stop processing [ 2 fold ] Flink DataStream 元素未更新 - Flink DataStream element not updating Flink DataStream-如何从输入元素启动源? - Flink DataStream - how to start a source from an input element? 如何使用Java在Apache Flink中对DataStream执行平均操作 - How to perform average operation on DataStream in Apache Flink using Java 如何为 Flink DataStream 执行简单的中值算法(最好在 Java 和 Flink 1.14 中)? - How do I perform a simple median algorithm for a Flink DataStream (preferably in Java and Flink 1.14)? Flink - 将 Avro 数据流转换为表 - Flink - Convert Avro datastream to table DataStream上的Flink SQL查询(Apache Flink Java) - Flink sql Query on DataStream (Apache Flink Java)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM