[英]Apache beam windowing: consider late data but emit only one pane
I would like to emit a single pane when the watermark reaches x minutes past the end of the window.当水印到达窗口末尾 x 分钟时,我想发出一个窗格。 This let's me ensure I handle some late data, but still only emit one pane.
这让我确保我处理了一些迟到的数据,但仍然只发出一个窗格。 I am currently working in java.
我目前在 Java 工作。
At the moment I can't find proper solutions to this problem.目前我无法找到解决此问题的适当方法。 I could emit a single pane when the watermark reaches the end of the window, but then any late data is dropped.
当水印到达窗口的末尾时,我可以发出单个窗格,但随后会丢弃任何迟到的数据。 I could emit the pane at the end of the window and then again when I receive late data, however in this case I am not emitting a single pane.
我可以在窗口末尾发出窗格,然后在收到延迟数据时再次发出窗格,但是在这种情况下,我不会发出单个窗格。
I currently have code similar to this:我目前有类似的代码:
.triggering(
// This is going to emit the pane, but I don't want emit the pane yet!
AfterWatermark.pastEndOfWindow()
// This is going to emit panes each time I receive late data, however
// I would like to only emit one pane at the end of the allowedLateness
).withAllowedLateness(allowedLateness).accumulatingFiredPanes())
In case there is still confusion, I would like to only emit a single pane when the watermark passes the allowedLateness
.如果仍然存在混淆,我只想在水印通过
allowedLateness
时发出单个窗格。
Thanks Guillem, in the end I used your answer to find this very useful link with lots of apache beam examples.谢谢 Guillem,最后我用你的回答找到了这个非常有用的链接,里面有很多 apache beam 示例。 From this I came up with the following solution:
由此我想出了以下解决方案:
// We first specify to never emit any panes
.triggering(Never.ever())
// We then specify to fire always when closing the window. This will emit a
// single final pane at the end of allowedLateness
.withAllowedLateness(allowedLateness, Window.ClosingBehavior.FIRE_ALWAYS)
.discardingFiredPanes())
What I would do is, first, to set Window.ClosingBehavior
to FIRE_ALWAYS
.我首先要做的是将
Window.ClosingBehavior
设置为FIRE_ALWAYS
。 This way, when the window is permanently closed it will send a final pane (even if there are no late records since the last pane) with PaneInfo.isLast
set to true
.这样,当窗口永久关闭时,它将发送一个最终窗格(即使自上一个窗格以来没有延迟记录)并将
PaneInfo.isLast
设置为true
。
Then, I would proceed with the second option:然后,我将继续第二个选项:
I could emit the pane at the end of the window and then again when I receive late data, however in this case I am not emitting a single pane.
我可以在窗口末尾发出窗格,然后在收到延迟数据时再次发出窗格,但是在这种情况下,我不会发出单个窗格。
But discarding downstream the panes that are not final with something like:但是在下游丢弃不是最终的窗格,例如:
public void processElement(ProcessContext c) {
if (c.pane().isLast) {
c.output(c.element());
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.