简体   繁体   English

Apache Beam/Java,如何设置每个 window 仅发送一次数据的窗口/触发器

[英]Apache Beam/Java, how to set window/trigger that sends the data only once for each window

I have a pipeline as below:我有一个管道如下:

Window<String> fixedWindow = Window.<String>into(FixedWindows.of(Duration.standardSeconds(options.getWindowDuration())))
      .triggering(
        AfterWatermark.pastEndOfWindow()
          .withEarlyFirings(AfterProcessingTime
            .pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(options.getWindowDuration()))))
      .withAllowedLateness(Duration.ZERO)
      .discardingFiredPanes();

PCollectionTuple productProcessorPT = pipeline
  .apply(READ_PRODUCT_FROM_PUBSUB.getName(), PubsubIO.readStrings()
    .fromSubscription(options.getProductSubscription()))
  .apply(PRODUCT_WINDOW.getName(), fixedWindow)
  .apply(PROCESS_PRODUCT.getName(), ParDo.of(new ProductProcessor()))
  .apply(GROUP_PRODUCT_DATA.getName(), GroupByKey.create())
  .apply(COMBINE_PRODUCT_DATA.getName(), ParDo.of(new ProductCombiner())
    .withOutputTags(KV_STRING_OBJECTNODE, TupleTagList.of(PIPELINE_ERROR)));

What I want to achieve is to set a window/trigger that gather the data every 60s, and then send the data to next transform.我想要实现的是设置一个窗口/触发器,每 60 秒收集一次数据,然后将数据发送到下一个转换。 How can I do that?我怎样才能做到这一点? I don't care the event timestamp.我不在乎事件时间戳。

The code above send data to next transform every 60s, but it also keeps triggering/sending (the same) data even there is no new data come in to the pipeline.上面的代码每 60 秒发送一次数据到下一次转换,但即使没有新数据进入管道,它也会继续触发/发送(相同的)数据。 No sure why that happens?不知道为什么会这样?

You can remove the triggering and just use FixedWindows as below to emit records every 60 seconds您可以删除触发并使用FixedWindows如下所示每 60 秒发出一次记录

Window<String> fixedWindow = Window.<String>into(FixedWindows.of(Duration.standardSeconds(options.getWindowDuration())));

This will use default triggering and handling of late events which would basically mean that the data is emitted at the end of the window and all late events are ignored.这将使用延迟事件的默认触发和处理,这基本上意味着数据在 window 的末尾发出,并且所有延迟事件都将被忽略。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM