简体   繁体   中英

How to test Flink Global Window with Trigger And Evictor

I have a pipeline which use Flink Global Window with custom Trigger based on Event Time (from timestamp on arriving element) and Evictor which cut unnecessary elements from the window and pass it to the ProcessFunction,

something like:

 public SingleOutputStreamOperator<Results> processElements(DataStream<Elements> inputStream) {
 return inputStream
                .keyBy(Elements::getId)
                .window(GlobalWindows.create())
                .trigger(new CustomTrigger())
                .evictor(new CustomEvictor())
                .process(new MyWindowProcessFunction())
                .name("Process")
                .uid("process-elements")
                .returns(Results.class);    
}

    public void executePipelineFlow(StreamExecutionEnvironment env) throws Exception {
        DataStream<Elements> inputStream = getInputStream(env);
        DataStream<Results> processedInput = processElements(inputStream);
        applySink(processedInput);
}

I know i can test MyWindowProcessFunction with TestHarness which provide Watermark manipulation but i need to test whole flow, Trigger+Evictor+ProcessFunction.

Also i try some kind of timed SourceFunction with use of Thread.sleep() but my pipeline work in event time and this wont work if i had 1000 elements in test stream (because test will take couple of hours).

My question is, how i can unit test my whole processElements method?

I cant find any test examples for my case.

Thanks

You might look at how the end-to-end integration tests for the windowing exercise in the Flink training are implemented as an example. This exercise isn't using GlobalWindows or custom triggering, etc, but you can use this overall approach to test any pipeline.

The one thing that's maybe less than ideal about this approach is how it handles watermarking. The applications being tested are using the default periodic watermarking strategy, wherein watermarks are generated every 200msec. Since the tests don't run that long, the only watermark that's actually generated is the one that comes at the end of every job with bounded inputs. This works, but isn't quite the same as what will happen in production. (Is this why you were thinking of having your test source sleep between events?)

BTW, these tests in the Flink training repo are made slightly more complex than is ordinarily necessary, because these tests are used to provide coverage for the Java and the Scala implementations of both the exercises and solutions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM