简体   繁体   中英

Why I got a NullPointerException when using initializeState() in Apache Flink?

I am using operator state with CheckpointedFuntion, however I encountered NullPointerException while initializing a MapState:

public void initializeState(FunctionInitializationContext context) throws Exception {
    MapStateDescriptor<Long, Long> descriptor
        = new MapStateDescriptor<>(
            "state",
            TypeInformation.of(new TypeHint<Long>() {}),
            TypeInformation.of(new TypeHint<Long>() {})
        );
    state = context.getKeyedStateStore().getMapState(descriptor);
}

I got the NullPointerException when I assign "descriptor" to getMapState()

Here is the stacktrace:

java.lang.NullPointerException
at fyp.Buffer.initializeState(Iteration.java:51)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:259)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:694)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:682)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:748)

I guess you're bumping into a NPE due to the fact you're attempting to access the KeyedStateStore documented here ; but, since you haven't a keyed stream, there is no such state store available along your job.

Gets a handle to the system's key/value state. The key/value state is only accessible if the function is executed on a KeyedStream. On each access, the state exposes the value for the key of the element currently processed by the function. Each function may have multiple partitioned states, addressed with different names.

So if you implement CheckpointedFunction (documented here ) on an unkeyed upstream (and you won't it) you should consider to access the operator state store

snapshotMetadata = context.getOperatorStateStore.getUnionListState(descriptor)

The operator state allows you to have one state per parallel instance of your job, conversely to the keyed state which each state instance depends on the keys produced by a keyed stream.

Note that in the above example we request .getUnionListState that will outcome all the parallel instances of your operator state (formatted as a list of states).

If you look for a concrete example you can give a shot to this source : it is an operator implementing an operator state.

At the end, if you need a keyed stream instead, so you might think to move your solution closer to keyed state Flink backend .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM