简体繁体中英

Flink Consumer with DataStream API for Batch Processing - How do we know when to stop & How to stop processing [ 2 fold ]

原文 2020-01-21 03:46:17 3 1 java/ apache-kafka/ apache-flink

I am basically trying to use the same Flink pipeline (of transformations, with different input parameters to distinguish between real-time and batch modes) to run it in Batch Mode & realtime mode. I want to use the DataStream API, as most of my transformations are dependent on DataStream API.

My Producer is Kafka & real time pipeline works just fine. Now I want to build a Batch pipeline with the same exact code with different topics for batch & real-time mode. How does my batch processor know when to stop processing?

One way I thought of was to add an extra parameter in the Producer record to say this is the last record, however, given multi partitioned topics, record delivery across multiple partitions does not guarantee the order (delivery inside one partition is guaranteed though).

What is the best practice to design this?

PS: I don't want to use DataSet API.

1 answers

You can use the DataStream API for batch processing without any issue. Basically, Flink will inject the barrier that will mark the end of the stream, so that Your application will work on finite streams instead of infinite ones.

I am not sure if Kafka is the best solution for the problem to be completely honest.

Generally, when implementing KafkaDeserializationSchema You have the method isEndOfStream() that will mark that the stream has finished. Perhaps, You could inject the end markers for each partition and simply check if all of the markers have been read and then finish the stream. But this would require You to know the number of partitions beforehand.

Java: Producer = Consumer, how to know when to stop?

How to stop the route when the processing of files is done?

How to print the Execution Plan for batch processing in Flink?

How to stop processing a JSP early?

Kafka Consumer: Stop processing messages when exception was raised

Blocking queue and multi-threaded consumer, how to know when to stop

How to get the processing kafka topic name dynamically in Flink Kafka Consumer?

How do I stop processing rows in Spring-JDBC?

How do I stop Jawr processing some parts of CSS files?

How do you stop processing when there's an error on the GRPC input stream?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Java: Producer = Consumer, how to know when to stop? How to stop the route when the processing of files is done? How to print the Execution Plan for batch processing in Flink? How to stop processing a JSP early? Kafka Consumer: Stop processing messages when exception was raised Blocking queue and multi-threaded consumer, how to know when to stop How to get the processing kafka topic name dynamically in Flink Kafka Consumer? How do I stop processing rows in Spring-JDBC? How do I stop Jawr processing some parts of CSS files? How do you stop processing when there's an error on the GRPC input stream?

Related Tags

Flink Consumer with DataStream API for Batch Processing - How do we know when to stop & How to stop processing [ 2 fold ]

Question

1 answers

solution1 2 ACCPTED 2020-01-21 10:14:37

solution1
2 ACCPTED 2020-01-21 10:14:37