[英]Windowing with Apache Beam - Fixed Windows Don't Seem to be Closing?
We are attempting to use fixed windows on an Apache Beam pipeline (using DirectRunner
). 我们正在尝试在Apache Beam管道上使用固定窗口(使用
DirectRunner
)。 Our flow is as follows: 我们的流程如下:
CombineFn
, combine each window of Event
s into a List<Event>
CombineFn
,将Event
s的每个窗口组合成List<Event>
List<Event>
List<Event>
Pipeline code: 管道代码:
pipeline
// Read from pubsub topic to create unbounded PCollection
.apply(PubsubIO
.<String>read()
.topic(options.getTopic())
.withCoder(StringUtf8Coder.of())
)
// Deserialize JSON into Event object
.apply("ParseEvent", ParDo
.of(new ParseEventFn())
)
// Window events with a fixed window size of 5 seconds
.apply("Window", Window
.<Event>into(FixedWindows
.of(Duration.standardSeconds(5))
)
)
// Group events by window
.apply("CombineEvents", Combine
.globally(new CombineEventsFn())
.withoutDefaults()
)
// Log grouped events
.apply("LogEvent", ParDo
.of(new LogEventFn())
);
The result we are seeing is that the final step is never run, as we don't get any logging. 我们看到的结果是最后一步永远不会运行,因为我们没有得到任何记录。
Also, we have added System.out.println("***")
in each method of our custom CombineFn
class, in order to track when these are run, and it seems they don't run either. 此外,我们在自定义
CombineFn
类的每个方法中添加了System.out.println("***")
,以便跟踪它们何时运行,并且它们似乎也不运行。
Is windowing set up incorrectly here? 窗口设置不正确吗? We followed an example found at https://beam.apache.org/documentation/programming-guide/#windowing and it seems fairly straightforward, but clearly there is something fundamental missing.
我们按照https://beam.apache.org/documentation/programming-guide/#windowing中的一个示例进行了操作,看起来相当简单,但显然有一些基本缺失。
Any insight is appreciated - thanks in advance! 感谢任何见解 - 提前感谢!
Looks like the main issue was indeed a missing trigger - the window was opening and there was nothing telling it when to emit results. 看起来主要问题确实是一个缺失的触发器 - 窗口打开了,没有什么可以告诉它何时发出结果。 We wanted to simply window based on processing time (not event time) and so did the following:
我们想根据处理时间(而不是事件时间)简单地窗口,所以做了以下事情:
.apply("Window", Window
.<Event>into(new GlobalWindows())
.triggering(Repeatedly
.forever(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(5))
)
)
.withAllowedLateness(Duration.ZERO).discardingFiredPanes()
)
Essentially this creates a global window, which is triggered to emit events 5 seconds after the first element is processed. 本质上,这会创建一个全局窗口,触发在处理第一个元素5秒后发出事件。 Every time the window is closed, another is opened once it receives an element.
每次关闭窗口时,一旦窗口收到元素,另一个窗口就会打开。 Beam complained when we didn't have the
withAllowedLateness
piece - as far as I know this just tells it to ignore any late data. 当我们没有
withAllowedLateness
片段时梁抱怨 - 据我所知这只是告诉它忽略任何后期数据。
My understanding may be a bit off the mark here, but the above snippet has solved our problem! 我的理解可能有点偏僻,但上面的片段已经解决了我们的问题!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.