简体   繁体   中英

Apache Beam/Java, how to set window/trigger that sends the data only once for each window

I have a pipeline as below:

Window<String> fixedWindow = Window.<String>into(FixedWindows.of(Duration.standardSeconds(options.getWindowDuration())))
      .triggering(
        AfterWatermark.pastEndOfWindow()
          .withEarlyFirings(AfterProcessingTime
            .pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(options.getWindowDuration()))))
      .withAllowedLateness(Duration.ZERO)
      .discardingFiredPanes();

PCollectionTuple productProcessorPT = pipeline
  .apply(READ_PRODUCT_FROM_PUBSUB.getName(), PubsubIO.readStrings()
    .fromSubscription(options.getProductSubscription()))
  .apply(PRODUCT_WINDOW.getName(), fixedWindow)
  .apply(PROCESS_PRODUCT.getName(), ParDo.of(new ProductProcessor()))
  .apply(GROUP_PRODUCT_DATA.getName(), GroupByKey.create())
  .apply(COMBINE_PRODUCT_DATA.getName(), ParDo.of(new ProductCombiner())
    .withOutputTags(KV_STRING_OBJECTNODE, TupleTagList.of(PIPELINE_ERROR)));

What I want to achieve is to set a window/trigger that gather the data every 60s, and then send the data to next transform. How can I do that? I don't care the event timestamp.

The code above send data to next transform every 60s, but it also keeps triggering/sending (the same) data even there is no new data come in to the pipeline. No sure why that happens?

You can remove the triggering and just use FixedWindows as below to emit records every 60 seconds

Window<String> fixedWindow = Window.<String>into(FixedWindows.of(Duration.standardSeconds(options.getWindowDuration())));

This will use default triggering and handling of late events which would basically mean that the data is emitted at the end of the window and all late events are ignored.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM