How to use non-keyed state with Kafka Consumer in Flink?

Question

I'm trying to implement (just starting work with Java and Flink) a non-keyed state in KafkaConsumer object, since in this stage no keyBy() in called. This object is the front end and the first module to handle messages from Kafka. SourceOutput is a proto file representing the message.

I have the KafkaConsumer object :

public class KafkaSourceFunction extends ProcessFunction<byte[], SourceOutput> implements Serializable
{
    @Override
    public void processElement(byte[] bytes, ProcessFunction<byte[], SourceOutput>.Context 
         context, Collector<SourceOutput> collector) throws Exception
    {
          // Here, I want to call to sorting method
          collector.collect(output);
    }
}

I have an object (KafkaSourceSort) that do all the sorting and should keep the unordered message in priorityQ in the state and also responsible to deliver the message if it comes in the right order thru the collector.

class SessionInfo
{
    public PriorityQueue<SourceOutput>  orderedMessages = null;

    public void putMessage(SourceOutput Msg)
    {
        if(orderedMessages == null)
            orderedMessages = new PriorityQueue<SourceOutput>(new SequenceComparator());

        orderedMessages.add(Msg);
    }
}

public class KafkaSourceState  implements Serializable
{
    public TreeMap<String, SessionInfo> Sessions = new TreeMap<>();
}

I read that I need to use a non-keyed state (ListState) which should contain a map of sessions while each session contains a priorityQ holding all messages related to this session.

I found an example so I implement this:

public class KafkaSourceSort implements SinkFunction<KafkaSourceSort>,
        CheckpointedFunction
{
    private transient ListState<KafkaSourceState> checkpointedState;
    private KafkaSourceState state;

    @Override
    public void snapshotState(FunctionSnapshotContext functionSnapshotContext) throws Exception
    {
        checkpointedState.clear();
        checkpointedState.add(state);
    }

    @Override
    public void initializeState(FunctionInitializationContext context) throws Exception
    {
        ListStateDescriptor<KafkaSourceState> descriptor =
                new ListStateDescriptor<KafkaSourceState>(
                        "KafkaSourceState",
                        TypeInformation.of(new TypeHint<KafkaSourceState>() {}));

        checkpointedState = context.getOperatorStateStore().getListState(descriptor);

        if (context.isRestored())
        {
            state = (KafkaSourceState) checkpointedState.get();
        }
    }

    @Override
    public void invoke(KafkaSourceState value, SinkFunction.Context contex) throws Exception
    {
        state = value;

       // ...
    }
}

I see that I need to implement an invoke message which probably will be called from processElement() but the signature of invoke() doesn't contain the collector and I don't understand how to do so or even if I did OK till now.

Please, a help will be appreciated. Thanks.

Answer 1

A SinkFunction is a terminal node in the DAG that is your job graph. It doesn't have a Collector in its interface because it cannot emit anything downstream. It is expected to connect to an external service or data store and send data there.

If you share more about what you are trying to accomplish perhaps we can offer more assistance. There may be an easier way to go about this.

How to use non-keyed state with Kafka Consumer in Flink?

Question

1 answers

solution1
0 2020-09-24 11:03:19

How to use non-keyed state with Kafka Consumer in Flink?

Question

1 answers

solution1 0 2020-09-24 11:03:19

solution1
0 2020-09-24 11:03:19