[英]Define return value in Spark Scala UDF
Imagine the following code: 想象一下以下代码:
def myUdf(arg: Int) = udf((vector: MyData) => {
// complex logic that returns a Double
})
How can I define the return type for myUdf so that people looking at the code will know immediately that it returns a Double? 如何定义myUdf的返回类型,以便查看代码的人会立即知道它返回Double?
There is nothing special about UDF with lambda functions, they behave just like scala lambda function (see Specifying the lambda return type in Scala ) so you could do: UDF与lambda函数没什么特别之处,它们的行为就像scala lambda函数一样(请参阅在Scala中指定lambda返回类型 ),这样你就可以:
def myUdf(arg: Int) = udf(((vector: MyData) => {
// complex logic that returns a Double
}): (MyData => Double))
or instead explicitly define your function: 或者明确定义你的功能:
def myFuncWithArg(arg: Int) {
def myFunc(vector: MyData): Double = {
// complex logic that returns a Double. Use arg here
}
myFunc _
}
def myUdf(arg: Int) = udf(myFuncWithArg(arg))
I see two ways to do it, either define a method first and then lift it to a function 我看到两种方法,首先定义一个方法,然后将其提升到一个函数
def myMethod(vector:MyData) : Double = {
// complex logic that returns a Double
}
val myUdf = udf(myMethod _)
or define a function first with explicit type: 或者首先使用显式类型定义函数:
val myFunction: Function1[MyData,Double] = (vector:MyData) => {
// complex logic that returns a Double
}
val myUdf = udf(myFunction)
I normally use the firt approach for my UDFs 我通常使用firt方法来处理我的UDF
You can pass a type parameter to udf
but you need to seemingly counter-intuitively pass the return type first, followed by the input types like [ReturnType, ArgTypes...]
, at least as of Spark 2.3.x. 你可以将一个类型参数传递给
udf
但你需要首先反直觉地传递返回类型,然后输入类型如[ReturnType, ArgTypes...]
,至少从Spark 2.3.x开始。 Using the original example ( which seems to be a curried function based on arg
): 使用原始示例( 它似乎是基于
arg
的curried函数 ):
def myUdf(arg: Int) = udf[Double, Seq[Int]]((vector: Seq[Int]) => {
13.37 // whatever
})
Spark functions define several udf
methods that have the following modifier/type: static <RT,A1, ..., A10> UserDefinedFunction
Spark 函数定义了几个具有以下修饰符/类型的
udf
方法: static <RT,A1, ..., A10> UserDefinedFunction
You can specify the input/output data types in square brackets as follows: 您可以在方括号中指定输入/输出数据类型,如下所示:
def myUdf(arg: Int) = udf[Double, MyData]((vector: MyData) => {
// complex logic that returns a Double
})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.