简体   繁体   中英

How do I trigger Apache Beam side inputs periodically?

I have a Dataflow Pipeline with streaming data, and I am using an Apache Beam Side Input of a bounded data source, which may have updates. How do I trigger a periodic update of this side input? Eg The side input should be refreshed once every 12 hours.

With reference to https://beam.apache.org/documentation/patterns/side-inputs/ , this is how I implemented the pipeline with side input:

PCollectionView<Map<Integer, Map<String, Double>>> sideInput = pipeline
        // We can think of it as generating "fake" events every 5 minutes
        .apply("Use GenerateSequence source transform to periodically emit a value",
            GenerateSequence.from(0).withRate(1, Duration.standardMinutes(WINDOW_SIZE)))
        .apply(Window.into(FixedWindows.of(Duration.standardMinutes(WINDOW_SIZE))))
        .apply(Sum.longsGlobally().withoutDefaults()) // what does this do?
        .apply("DoFn periodically pulls data from a bounded source", ParDo.of(new FetchData()))
        .apply("Build new Window whenever side input is called",
            Window.<Map<Integer, Map<String, Double>>>into(new GlobalWindows())
                .triggering(Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane()))
                .discardingFiredPanes())
        .apply(View.asSingleton());


pipeline
 .apply(...)
 .apply("Add location to Event",
            ParDo.of(new DoFn<>).withSideInputs(sideInput))
 .apply(...)

Is this the correct way of implementation?

Did you follow the "Slowly updating side input using windowing" part of the mentioned link ?

It suggests using PeriodicImpulse , which better fits your description.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM