[英]Define A UDF with Generic Type and Extra Parameter
I want to define a UDF in scala spark like the pseudo code below: 我想像下面的伪代码一样在scala spark中定义UDF:
def transformUDF(size:Int):UserDefinedFunction = udf((input:Seq[T]){
if (input != null)
Vectors.dense(input.map(_.toDouble).toArray)
else
Vectors.dense(Array.fill[Double](size)(0.0))
})
if input
is not null, cast every element to Double Type. 如果
input
不为null,则将每个元素都转换为Double Type。
if input
is null, return a all-zero vector. 如果
input
为null,则返回全零向量。
And I want T
to be limited to numeric type, like java.lang.Number in Java. 而且我希望
T
限于数字类型,例如Java中的java.lang.Number。 But it seems that Seq[java.lang.Number]
cannot work with the toDouble
. 但是似乎
Seq[java.lang.Number]
无法与toDouble
。
Is there any appropriate way? 有什么合适的方法吗?
As mentioned in my working comment as 正如我在工作评论中提到的那样
def transformUDF: UserDefinedFunction = udf((size: Int, input:Seq[java.lang.Number]) => {
if (input != null)
Vectors.dense(input.map(_.doubleValue()).toArray)
else
Vectors.dense(Array.fill[Double](size)(0.0))
})
You don't need to create a new column, you can just pass it to the udf function as 您无需创建新列,只需将其传递给udf函数即可:
dataframe.withColumn("newCol", transformUDF(lit(the size you want), dataframe("the column you want to transform")))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.