简体   繁体   English

使用通用类型和附加参数定义UDF

[英]Define A UDF with Generic Type and Extra Parameter

I want to define a UDF in scala spark like the pseudo code below: 我想像下面的伪代码一样在scala spark中定义UDF:

def transformUDF(size:Int):UserDefinedFunction = udf((input:Seq[T]){

  if (input != null)
    Vectors.dense(input.map(_.toDouble).toArray)
  else
    Vectors.dense(Array.fill[Double](size)(0.0))

})

if input is not null, cast every element to Double Type. 如果input不为null,则将每个元素都转换为Double Type。
if input is null, return a all-zero vector. 如果input为null,则返回全零向量。

And I want T to be limited to numeric type, like java.lang.Number in Java. 而且我希望T限于数字类型,例如Java中的java.lang.Number。 But it seems that Seq[java.lang.Number] cannot work with the toDouble . 但是似乎Seq[java.lang.Number]无法与toDouble

Is there any appropriate way? 有什么合适的方法吗?

As mentioned in my working comment as 正如我在工作评论中提到的那样

def transformUDF: UserDefinedFunction = udf((size: Int, input:Seq[java.lang.Number]) => {
  if (input != null)
    Vectors.dense(input.map(_.doubleValue()).toArray)
  else
    Vectors.dense(Array.fill[Double](size)(0.0))
})

You don't need to create a new column, you can just pass it to the udf function as 您无需创建新列,只需将其传递给udf函数即可:

dataframe.withColumn("newCol", transformUDF(lit(the size you want), dataframe("the column you want to transform")))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM