简体   繁体   English

如果多个 Kafka 主题之一空闲,如何触发 window

[英]How to trigger window if one of multiple Kafka topics are idle

I'm consuming multiple Kafka topics, windowing them hourly and writing them into separate parquet files for each topic.我正在使用多个 Kafka 主题,每小时将它们窗口化,并将它们写入每个主题的单独镶木地板文件中。 However, if one of the topics are idle, the window does not get triggered and nothing is written to the FS.但是,如果其中一个主题空闲,则不会触发 window 并且不会向 FS 写入任何内容。 For this example, I'm consuming 2 topics with a single partition.对于此示例,我使用单个分区使用 2 个主题。 taskmanager.numberOfTaskSlots: 2 and parallelism.default: 1 . taskmanager.numberOfTaskSlots: 2parallelism.default: 1 What is the proper way of solving this problem in Apache Beam with Flink Runner?在带有 Flink Runner 的 Apache Beam 中解决此问题的正确方法是什么?

pipeline
    .apply(
    "ReadKafka",
    KafkaIO
        .read[String, String]
        .withBootstrapServers(bootstrapServers)
        .withTopics(topics)
        .withCreateTime(Duration.standardSeconds(0))
        .withReadCommitted
        .withKeyDeserializer(classOf[StringDeserializer])
        .withValueDeserializer(classOf[StringDeserializer])
        .withoutMetadata()
    )
    .apply("ConvertToMyEvent", MapElements.via(new KVToMyEvent()))
    .apply(
    "WindowHourly",
    Window.into[MyEvent](FixedWindows.of(Duration.standardHours(1)))
    )
    .apply(
    "WriteParquet",
    FileIO
        .writeDynamic[String, MyEvent]()
        .by(new BucketByEventName())
    //...
    )

TimeWindow needs data. TimeWindow 需要数据。 If the topic is idle, it means, there is no data to close the Window and the window is open until the data arrives.如果主题空闲,则意味着没有数据可以关闭 Window 并且 window 是打开的,直到数据到达。 If you want to window data based on Processing time instead of actual event time, try using a simple process function如果要 window 数据基于处理时间而不是实际事件时间,请尝试使用简单的过程 function

 public class MyProcessFunction extends 
     KeyedProcessFunction<KeyDataType,InputDataType,OutputDataType>{ 
     // The data type can be primitive like String or your custom class

         private transient ValueState<Long> windowDesc;

         @Override
         public void open(final Configuration conf) {

             final ValueStateDescriptor<Long> windowDesc = new ValueStateDescriptor("windowDesc", Long.class);
             this.windowTime = this.getRuntimeContext().getState(windowDesc); // normal variable declaration does not work. Declare variables like this and use it inside the functions
     
         }
         
         
         @Override
         public void processElement(InputType input, Context context, Collector<OutPutType> collector)
             throws IOException {
 
             this.windowTime.update( <window interval> ); // milliseconds are recommended
             context.timerService().registerProcessingTimeTimer(this.windowTime.value());//register a timer. Timer runs for windowTime from the current time.
             .
             .
             .
 
             if( this.windowTime.value() != null ){
                 context.timerService().deleteProcessingTimeTimer(this.windowTime.value()); 
                 // delete any existing time  if you want to reset timer 
             }
         }
 
         @Override
     public void onTimer(long timestamp, KeyedProcessFunction<KeyDataType,InputDataType,OutputDataType>.OnTimerContext context,
             Collector<OutputType> collector) throws IOException {
             //This method is executed when the timer reached
               collector.collect( < whatever you want to stream out> );// this data will be available in the pipeline
             }
 }
 ```

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM