簡體   English   中英

ProcessWindowFunction (Apache Flink Java) 中的鍵控 state 存儲行為

[英]Keyed state store behaviour within ProcessWindowFunction (Apache Flink Java)

我有一個ProcessWindowFunction用於處理 TumblingEventTimeWindows ,其中我使用 state 存儲來保留多個翻滾 windows中的一些值。 My problem is that this state store is not being preserved across tumbling windows ie if I first store something in window [0,999] and then access this store from window [1000,1999], the store is empty. 我知道全球 state每個 window state 此處所述。 我想使用全局 state。 我還嘗試創建一個最小的工作示例來調查這個:

import org.apache.flink.api.common.state.MapState;
import org.apache.flink.api.common.state.MapStateDescriptor;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.watermark.Watermark;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;

import javax.annotation.Nullable;


public class twStateStoreTest {


    public static void main(String[] args) throws Exception {
        // set up the streaming execution environment
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        env.getConfig().setAutoWatermarkInterval(1000L);

        final DataStream<Element> elements = env.fromElements(
                Element.from(1, 500),
                Element.from(1, 1000),
                Element.from(1, 1500),
                Element.from(1, 2000),

                Element.from(99, 9999)
                ).
                assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks<Element>() {
                    long w;
                    @Nullable
                    @Override
                    public Watermark getCurrentWatermark() {
                        return new Watermark(w);
                    }

                    @Override
                    public long extractTimestamp(Element element, long previousElementTimestamp) {
                        w = element.getTimestamp();
                        return w;
                    }
                });

        elements
                .keyBy(new KeySelector<Element, Integer>() {
                    @Override
                    public Integer getKey(Element element) throws Exception {
                        return element.value;
                    }
                })
                .window(TumblingEventTimeWindows.of(Time.milliseconds(1000L)))
                .process(new MyProcessWindowFn()).
                print();

        // execute program
        env.execute("Flink Streaming Java API Skeleton");
    }

    static class MyProcessWindowFn extends ProcessWindowFunction<Element, String, Integer, TimeWindow> {
        MapState<Integer, Integer> stateStore;

        @Override
        public void open(Configuration parameters) throws Exception {
            stateStore = getRuntimeContext().getMapState(new MapStateDescriptor<Integer, Integer>("stateStore", Integer.class, Integer.class));
        }

        @Override
        public void process(Integer key, Context context, Iterable<Element> elements, Collector<String> out) throws Exception {

            if (stateStore.get(key) == null) {
                stateStore.put(key, 1);
            }else {
                int previous = stateStore.get(key);
                stateStore.put(key, previous+1);
            }
            out.collect("State store for " + elements.toString() + " is " + stateStore.entries().toString()
                    + " for window : " + context.window());
        }
    }




    static class Element {
        private final long timestamp;
        private final int value;

        public Element(long timestamp, int value) {
            this.timestamp = timestamp;
            this.value = value;
        }

        public long getTimestamp() {
            return timestamp;
        }

        public int getValue() {
            return value;
        }

        public static Element from(int value, long timestamp) {
            return new Element(timestamp, value);
        }
    }


}

在這里,我試圖計算process() function 被調用的次數。 此示例有效,並且 state 確實存儲在翻滾的 windows 中。 我已確保此示例完全反映了實際的 processWindow function,並刪除了其他不必要的代碼。

但是state在實際過程中沒有跨windows保留WindowFunction!

是否有任何我明顯遺漏的問題? 對於使用如下定義的 MapState 的 processWindowFunction,沒有在 EventTimeTumblingWindows 中保留 state 是否還有其他原因:

private MapState<UserDefinedEnum, Boolean> activeSessionStore;

@Override
    public void open(Configuration parameters) throws Exception {
        activeSessionStore = getRuntimeContext().getMapState(new MapStateDescriptor<IUEventType, Boolean>(
                                                "name", UserDefinedEnum.class, Boolean.class));
    }

這是實際的 class 並根據@David 和@ShemTov 的建議刪除了膨脹:

public class IUFeatureStateCombiner extends ProcessWindowFunction<IUSessionMessage, IUSessionMessage, IUMonitorFeatureKey, TimeWindow> {

    private final static MapStateDescriptor<IUEventType, Boolean> desc =  new MapStateDescriptor<IUEventType, Boolean>(
            "store", IUEventType.class, Boolean.class);
    private final Logger LOGGER = LoggerFactory.getLogger(IUFeatureStateCombiner.class);

    @Override
    public void process(IUMonitorFeatureKey iuMonitorFeatureKey, Context context, Iterable<IUSessionMessage> elements, Collector<IUSessionMessage> out) throws Exception {
        ...

        MapState<IUEventType, Boolean> activeSessionStore = context.globalState().getMapState(desc);

        Iterable<Entry<IUEventType, Boolean>> lastFeatureStates = activeSessionStore.entries(); // <-------- This returns an empty iterable
        // even though I populated activeSessionStore with some values in the previous invocation of process()

        ... do something based on lastFeatureStates....

        activeSessionStore.put(...);
    }

    @Override
    public void clear(Context context) throws Exception {
        context.globalState().getMapState(desc).clear();
    }
}

我使用以下方法調用它:

inputStream.keyBy(IUSessionMessage::getMonitorFeatureKey).
window(TumblingEventTimeWindows.of(Time.milliseconds(1000L))).
            process(new IUFeatureStateCombiner())

這仍然存在問題,即使我在上一次調用中填充了 state,我在第二次調用process()時得到了一個空的迭代。

編輯:問題已解決,不應調用 clear() 方法,因為這是一個全局 state。

你想做更多這樣的事情。 請記住,這些是每個鍵的 state 商店——每個鍵都有一個單獨的 map ——所以你在哪里做 stateStore.get stateStore.get(key) ,這真的沒有意義。 也許您只需要ValueState ,如果您只需要為每個鍵存儲一個 Integer 。

static class MyProcessWindowFn extends ProcessWindowFunction<Element, String, Integer, TimeWindow> {
    private final static MapStateDescriptor mapDesc = new MapStateDescriptor<Integer, Integer>("stateStore", Integer.class, Integer.class);

    @Override
    public void process(Integer key, Context context, Iterable<Element> elements, Collector<String> out) throws Exception {

        MapState<Integer, Integer> stateStore = context.globalState.getMapState(mapDesc);

        ...
    }
}

請注意,全局 state 存儲永遠不會被清除。 所以如果你有一個無限的鍵空間,你最終會遇到問題。 您可以在 state 描述符上配置state TTL來處理此問題。

據我所知,您無法從@override open 方法中獲取全局 state 。

您需要從 ProcessWindowFunction 上的進程 function 中獲取它:

context.globalState().getMapState(<your_Map_State_Descriptor>)

我的錯誤是我錯誤地使用了clear()方法。 由於這是一個全局 state,使用clear()方法將在 TumblingWindow 到期后立即清除 state。 正如大衛指出的那樣,全局 state 永遠不會被清除,我們必須為無界密鑰流定義一個 TTL。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM