简体   繁体   中英

How to sessionize stream with Apache Flink?

I want to sessionize this stream: 1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,0,3,3,3,5, ... to these sessions:

1,1,1
2,2,2,2,2
3,3,3,3,3,3,3
0
3,3,3
5

I've wrote CustomTrigger to detect when stream elements change from 1 to 2 (2 to 3, 3 to 0 and so on) and then fire the trigger. But this is not the solution, because when I processing the first element of 2's, and fire the trigger the window will be [1,1,1,2] but I need to fire the trigger on the last element of 1's.

Here is the pesudo of my onElement function in my custom trigger class:

override def onElement(element: Session, timestamp: Long, window: W, ctx: TriggerContext): TriggerResult = {
    if (prevState == element.value) {
      prevState = element.value
      TriggerResult.CONTINUE
    } else {
      prevState = element.value
      TriggerResult.FIRE
    }
}

How can I solve this problem?

I think a FlatMapFunction with a ListState is the easiest way to implement this use-case.

When a new element arrives (ie, the flatMap() method is called), you check if the value changed. If the value did not changed, you append the element to the state. If the value changed, you emit the current list state as a session, clear the list, and insert the new element as the first to the list state.

However, you should keep in mind that this assumes that the order of elements is preserved. Flink ensures within a partition, ie, as long as elements are not shuffled and all operators run with the same parallelism.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM