简体   繁体   中英

How to trigger window if one of multiple Kafka topics are idle

I'm consuming multiple Kafka topics, windowing them hourly and writing them into separate parquet files for each topic. However, if one of the topics are idle, the window does not get triggered and nothing is written to the FS. For this example, I'm consuming 2 topics with a single partition. taskmanager.numberOfTaskSlots: 2 and parallelism.default: 1 . What is the proper way of solving this problem in Apache Beam with Flink Runner?

pipeline
    .apply(
    "ReadKafka",
    KafkaIO
        .read[String, String]
        .withBootstrapServers(bootstrapServers)
        .withTopics(topics)
        .withCreateTime(Duration.standardSeconds(0))
        .withReadCommitted
        .withKeyDeserializer(classOf[StringDeserializer])
        .withValueDeserializer(classOf[StringDeserializer])
        .withoutMetadata()
    )
    .apply("ConvertToMyEvent", MapElements.via(new KVToMyEvent()))
    .apply(
    "WindowHourly",
    Window.into[MyEvent](FixedWindows.of(Duration.standardHours(1)))
    )
    .apply(
    "WriteParquet",
    FileIO
        .writeDynamic[String, MyEvent]()
        .by(new BucketByEventName())
    //...
    )

TimeWindow needs data. If the topic is idle, it means, there is no data to close the Window and the window is open until the data arrives. If you want to window data based on Processing time instead of actual event time, try using a simple process function

 public class MyProcessFunction extends 
     KeyedProcessFunction<KeyDataType,InputDataType,OutputDataType>{ 
     // The data type can be primitive like String or your custom class

         private transient ValueState<Long> windowDesc;

         @Override
         public void open(final Configuration conf) {

             final ValueStateDescriptor<Long> windowDesc = new ValueStateDescriptor("windowDesc", Long.class);
             this.windowTime = this.getRuntimeContext().getState(windowDesc); // normal variable declaration does not work. Declare variables like this and use it inside the functions
     
         }
         
         
         @Override
         public void processElement(InputType input, Context context, Collector<OutPutType> collector)
             throws IOException {
 
             this.windowTime.update( <window interval> ); // milliseconds are recommended
             context.timerService().registerProcessingTimeTimer(this.windowTime.value());//register a timer. Timer runs for windowTime from the current time.
             .
             .
             .
 
             if( this.windowTime.value() != null ){
                 context.timerService().deleteProcessingTimeTimer(this.windowTime.value()); 
                 // delete any existing time  if you want to reset timer 
             }
         }
 
         @Override
     public void onTimer(long timestamp, KeyedProcessFunction<KeyDataType,InputDataType,OutputDataType>.OnTimerContext context,
             Collector<OutputType> collector) throws IOException {
             //This method is executed when the timer reached
               collector.collect( < whatever you want to stream out> );// this data will be available in the pipeline
             }
 }
 ```

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM