[英]From Column to Array Scala Spark
I am trying to apply a function on a Column
in scala, but i am encountering some difficulties. 我正在尝试在scala的
Column
上应用函数,但是遇到了一些困难。
There is this error 有这个错误
found : org.apache.spark.sql.Column
required: Array[Double]
Is there a way to convert a Column
to an Array
? 有没有一种方法可以将
Column
转换为Array
? Thank you 谢谢
Update: 更新:
Thank you very much for your answer, I think I am getting closer to what I am trying to achieve. 非常感谢您的回答,我想我越来越接近我想要达到的目标。 I give you a little bit of more context:
我为您提供更多背景信息:
Here the code: 这里的代码:
object Targa_Indicators_Full {
def get_quantile (variable: Array[Double], perc:Double) : Double = {
val sorted_vec:Array[Double]=variable.sorted
val pos:Double= Math.round(perc*variable.length)-1
val quant:Double=sorted_vec(pos.toInt)
quant
}
def main(args: Array[String]): Unit = {
val get_quantileUDF = udf(get_quantile _)
val plate_speed =
trips_df.groupBy($"plate").agg(sum($"time_elapsed").alias("time"),sum($"space").alias("distance"),
stddev_samp($"distance"/$"time_elapsed").alias("sd_speed"),
get_quantileUDF($"distance"/$"time_elapsed",.75).alias("Quant_speed")).
withColumn("speed", $"distance" / $"time")
}
Now I get this error: 现在我得到这个错误:
type mismatch;
[error] found : Double(0.75)
[error] required: org.apache.spark.sql.Column
[error] get_quantileUDF($"distanza"/$"tempo_intermedio",.75).alias("IQR_speed")
^
[error] one error found
What can I do? 我能做什么? Thanks.
谢谢。
You cannot directly apply a function on the Dataframe column. 您不能直接在“数据框”列上应用函数。 You have to convert your existing function to UDF.
您必须将现有功能转换为UDF。 Spark provides user to define custom user defined functions(UDF).
Spark为用户提供了定义自定义用户定义函数(UDF)的功能。
eg: You have a dataframe with array column 例如:您有一个带有数组列的数据框
scala> val df=sc.parallelize((1 to 100).toList.grouped(5).toList).toDF("value")
df: org.apache.spark.sql.DataFrame = [value: array<int>]
You have defined a function to apply on the array type column 您已经定义了要应用于数组类型列的函数
def convert( arr:Seq[Int] ) : String = {
arr.mkString(",")
}
You have to convert this to udf before applying on the column 在将其应用于列之前,必须将其转换为udf
val convertUDF = udf(convert _)
And then you can apply your function: 然后可以应用函数:
df.withColumn("new_col", convertUDF(col("value")))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.