简体   繁体   English

Apache Flink - 事件时间窗口

[英]Apache Flink - Event time windows

I want to create keyed windows in Apache flink such that the windows for each key gets executed n minutes after arrival of first event for the key.我想在 Apache flink 中创建键控窗口,以便每个键的窗口在键的第一个事件到达后 n 分钟执行。 Is it possible to be done using Event time characteristics ( as processing time depends on system clock and it is uncertain when will the first event arrives ).是否可以使用事件时间特性来完成(因为处理时间取决于系统时钟,并且不确定第一个事件何时到达)。 If it is possible please explain the assignment of Event time and watermark also to the events and also explain how to call the process window function after n minutes.如果可能,请解释事件时间和水印的分配也给事件,并解释如何在 n 分钟后调用进程窗口函数。

Below is a part of code which can give you an idea about what i am doing currently :以下是一部分代码,可以让您了解我目前在做什么:

            //Make keyed events so as to start a window for a key
            KeyedStream<SourceData, Tuple> keyedEvents = 
                    env.addSource(new MySource(configData),"JSON Source")
                    .assignTimestampsAndWatermarks(new MyTimeStamps())
                    .setParallelism(1)
                    .keyBy("service");


            //Start a window for windowTime time
            DataStream<ResultData> resultData=
                    keyedEvents
                    .timeWindow(Time.minutes(winTime))
                    .process(new ProcessEventWindow(configData))
                    .name("Event Collection Window")
                    .setParallelism(25);

So, how would i assign the Event time and wateramark such that the window follow the event time of first event as starting point and executes after 10 minutes ( start time of first event can be different for different keys ).那么,我将如何分配事件时间和水印,以便窗口遵循第一个事件的事件时间作为起点并在 10 分钟后执行(第一个事件的开始时间对于不同的键可能不同)。 Any help would be really appreciated.任何帮助将非常感激。

        /------------ ( window of 10 minutes )
Streams          |------------ ( window of 10 minutes )
            \------------ ( window of 10 minutes )

Edit : Class i used for assigning timestamp and watermarks编辑:我用于分配时间戳和水印的类

public class MyTimeStamps implements AssignerWithPeriodicWatermarks<SourceData> {

    @Override
    public long extractTimestamp(SourceData element, long previousElementTimestamp) {

          //Will return epoch of currentTime
        return GlobalUtilities.getCurrentEpoch();
    }

    @Override
    public Watermark getCurrentWatermark() {
        // TODO Auto-generated method stub
        //Will return epoch of currentTime + 10 minutes
        return new Watermark(GlobalUtilities.getTimeShiftNMinutesEpoch(10));
    }

}

I think for your use case it would be best to use the ProcessFunction .我认为对于您的用例,最好使用ProcessFunction What you could do is register an EventTimeTimer when the first event comes.您可以做的是在第一个事件到来时注册一个 EventTimeTimer。 Than in the onTimer method emit the results.比在onTimer方法中发出结果。

Something like:就像是:

public class ProcessFunctionImpl extends ProcessFunction<SourceData, ResultData> {

    @Override
    public void processElement(SourceData value, Context ctx, Collector<ResultData> out)
        throws Exception {

        // retrieve the current aggregate
        ResultData current = state.value();
        if (current == null) {
            // first event arrived
            current = new ResultData();
            // register end of window
            ctx.timerService().registerEventTimeTimer(ctx.timestamp() + 10 * 60 * 1000 /* 10 minutes */);
        }

        // update the state's aggregate
        current += value;

        // write the state back
        state.update(current);
    }

    @Override
    public void onTimer(long timestamp, OnTimerContext ctx, Collector<ResultData> out)
        throws Exception {

        // get the state for the key that scheduled the timer
        ResultData result = state.value();

        out.collect(result);

        // reset the window state
        state.clear();
    }
}

I had a similar question a while ago in regard to event time windows.不久前我有一个关于事件时间窗口的类似问题。 Here's what my stream looks like这是我的流的样子

val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

//Consumer Setup

val stream = env.addSource(consumer)
  .assignTimestampsAndWatermarks(new WMAssigner)

// Additional Setup here

stream
  .keyBy { data => data.findValue("service") }
  .window(TumblingEventTimeWindows.of(Time.minutes(10)))
  .process { new WindowProcessor }

  //Sinks go here

My WMAssigner class looked like this(Note: This allowed for 1 minute of out of order events to happen, you can extend a different Timestamp extractor if you don't want to allow for lateness):我的 WMAssigner 类看起来像这样(注意:这允许 1 分钟的乱序事件发生,如果您不想延迟,您可以扩展不同的时间戳提取器):

class WMAssigner extends BoundedOutOfOrdernessTimestampExtractor[ObjectNode] (Time.seconds(60)) {
  override def extractTimestamp(element: ObjectNode): Long = {
    val tsStr = element.findValue("data").findValue("ts").toString replaceAll("\"", "")
    tsStr.toLong
  }
}

My timestamp I wanted to use for Watermarks was data.ts field.我想用于水印的时间戳是 data.ts 字段。

My WindowProcessor:我的窗口处理器:

class WindowProcessor extends ProcessWindowFunction[ObjectNode,String,String,TimeWindow] {
  override def process(key: String, context: Context, elements: Iterable[ObjectNode], out: Collector[String]): Unit = {
    val out = ""
    elements.foreach( value => {
      out = value.findValue("data").findValue("outData")
    }
    out.collect(out)
  }
}

Let me know if anything is unclear如果有任何不清楚的地方,请告诉我

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM