简体   繁体   English

使用Flink DataStream计算窗口持续时间的平均值

[英]Calculate average using Flink DataStream for a window duration

I am using Flink DataStream API where there where racks are available & I want to calculate "average"of temperature group by rack IDs. 我正在使用Flink DataStream API,其中有机架可用,我想按机架ID计算温度组的“平均值”。 My window duration is of 40 seconds & my window is sliding every 10 seconds...Following is my code where I am calculating sum of temperatures every 10 seconds for every rackID,but now I want to calculate average temperatures:: 我的窗口持续时间为40秒,我的窗口每10秒钟滑动一次......以下是我的代码,我计算每个rackID每10秒的温度总和 ,但现在我想计算平均温度::

static Properties properties=new Properties();
    public static Properties getProperties()
    {
        properties.setProperty("bootstrap.servers", "54.164.200.104:9092");
        properties.setProperty("zookeeper.connect", "54.164.200.104:2181");
        //properties.setProperty("deserializer.class", "kafka.serializer.StringEncoder");
        //properties.setProperty("group.id", "akshay");
        properties.setProperty("auto.offset.reset", "earliest");
        return properties;
    }

 @SuppressWarnings("rawtypes")
public static void main(String[] args) throws Exception 
{
    StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
    Properties props=Program.getProperties();
    DataStream<TemperatureEvent> dstream=env.addSource(new FlinkKafkaConsumer09<TemperatureEvent>("TemperatureEvent", new TemperatureEventSchema(), props)).assignTimestampsAndWatermarks(new IngestionTimeExtractor<>());
    DataStream<TemperatureEvent> ds1=dstream.keyBy("rackId").timeWindow(Time.seconds(40), Time.seconds(10)).sum("temperature");
    env.execute("Temperature Consumer");
}

How can I calcluate average temperature for the above example ?? 如何计算上述例子的平均温度?

As far as I can tell, you need to write the average function yourself. 据我所知,你需要自己编写普通函数。 You can find an example here: 你可以在这里找到一个例子:

https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/windowing/GroupedProcessingTimeWindowExample.java https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/windowing/GroupedProcessingTimeWindowExample.java

In your case, you would probably replace .sum("temperature"); 在你的情况下,你可能会取代.sum("temperature");

with something like .apply(new Avg()); .apply(new Avg()); and implement the Avg class: 并实现Avg类:

public class Avg implements WindowFunction<TemperatureEvent,  TemperatureEvent, Long, org.apache.flink.streaming.api.windowing.windows.Window> {

  @Override
  public void apply(Long key, Window window, Iterable<TemperatureEvent> values, Collector<TemperatureEvent> out) {
    long sum = 0L;
    int count = 0;
    for (TemperatureEvent value : values) {
        sum += value.getTemperature();
        count ++;
    }

    TemperatureEvent result = values.iterator().next();
    result.setTemperature(sum / count);
    out.collect(result);
  }
}

Note: If there's any chance that your function will be called on an empty window (eg by using custom triggers), you need a check before accessing elements.head 注意:如果有可能在空窗口上调用您的函数(例如使用自定义触发器),则需要在访问elements.head之前进行检查。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM