简体   繁体   English

Flink Scala - 扩展WindowFunction

[英]Flink Scala - Extending WindowFunction

I am trying to figure out how to write my own WindowFunction but having issues, and I can not figure out why. 我试图弄清楚如何编写我自己的WindowFunction但有问题,我无法弄清楚为什么。 The issue I am having is with the apply function, as it does not recognize MyWindowFunction as a valid input, so I can not compile. 我遇到的问题是apply函数,因为它不能将MyWindowFunction识别为有效输入,所以我无法编译。 The data I am streaming contains (timestamp,x,y) where x and y are 0 and 1 for testing. 我正在流式传输的数据包含(timestamp,x,y) ,其中x和y为0和1用于测试。 extractTupleWithoutTs simply returns a tuple (x,y) . extractTupleWithoutTs只返回一个元组(x,y) I have been running the code with simple sum and reduce functions with success. 我一直在使用简单的sum和reduce函数运行代码并且成功。 Grateful for any help :) Using Flink 1.3 感谢任何帮助:)使用Flink 1.3

Imports: 进口:

import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks
import org.apache.flink.streaming.api.scala.function.WindowFunction
import org.apache.flink.streaming.api.watermark.Watermark
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
import org.apache.flink.util.Collector

Rest of the code: 其余代码:

val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val text = env.socketTextStream("localhost", 9999).assignTimestampsAndWatermarks(new TsExtractor)
val tuple = text.map( str => extractTupleWithoutTs(str))
val counts = tuple.keyBy(0).timeWindow(Time.seconds(5)).apply(new MyWindowFunction())
counts.print()
env.execute("Window Stream")

MyWindow function which is basically copy paste from example with changes of the types. MyWindow函数基本上是从示例中复制粘贴的类型更改。

class MyWindowFunction extends WindowFunction[(Int, Int), Int, Int, TimeWindow] {
  def apply(key: Int, window: TimeWindow, input: Iterable[(Int, Int)], out: Collector[Int]): () = {
    var count = 0
    for (in <- input) {
      count = count + 1
    }
    out.collect(count)
  }
}

The problem is the third type parameter of the WindowFunction , ie, the type of the key. 问题是WindowFunction的第三个类型参数,即键的类型。 The key is declared with an index in the keyBy method ( keyBy(0) ). 密钥在keyBy方法( keyBy(0) )中使用索引声明。 Therefore, the type of the key cannot be determined at compile time. 因此,在编译时无法确定密钥的类型。 The same problem arises, if you declare the key as a string, ie, keyBy("f0") . 如果将键声明为字符串,即keyBy("f0")keyBy("f0")出现同样的问题。

There are two options to resolve this: 有两种方法可以解决这个问题:

  1. Use a KeySelector function in keyBy to extract the key (something like keyBy(_._1) ). keyBy使用KeySelector函数来提取密钥(类似于keyBy(_._1) )。 The return type of the KeySelector function is known at compile time such that you can use a correctly typed WindowFunction with an Int key. KeySelector函数的返回类型在编译时是已知的,这样您就可以使用带有Int键的正确类型的WindowFunction
  2. Change the type of the third type parameter of the WindowFunction to org.apache.flink.api.java.tuple.Tuple , ie, WindowFunction[(Int, Int), Int, org.apache.flink.api.java.tuple.Tuple, TimeWindow] . WindowFunction的第三个类型参数的类型更改为org.apache.flink.api.java.tuple.Tuple ,即WindowFunction[(Int, Int), Int, org.apache.flink.api.java.tuple.Tuple, TimeWindow] Tuple is a generic holder for the keys extracted by keyBy . TuplekeyBy提取的密钥的通用持有者。 In your case it will be a org.apache.flink.api.java.tuple.Tuple1 . 在你的情况下,它将是一个org.apache.flink.api.java.tuple.Tuple1 In WindowFunction.apply() you can cast Tuple to Tuple1 and access the key field by Tuple1.f0 . WindowFunction.apply()您可以将TupleTuple1并通过Tuple1.f0访问关键字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM