[英]Scala Spark DataFrame : dataFrame.select multiple columns given a Sequence of column names
[英]Spark Scala: How to pass column name in UDF with DataFrame.Select
我有一个这样的代码片段:
case class Purchase(cid: Int, pid: String, num: String)
val x = sc.parallelize(Array(
Purchase(123, "234", "1"),
Purchase(123, "247", "2"),
Purchase(189, "254", "3"),
Purchase(187, "299", "4")
))
// I have a dataframe structure: [cid: int, pid: string, num: string]
val df = sqlContext.createDataFrame(x)
// Defining a column name which I need to transform. Its value can change, like pid
val colName = "num"
// Defining a UDF. The definition of the UDF can change
val toIntUdf = udf((myString: String) => myString.toInt )
// This works
df.select( toIntUdf($"num") ).collect
我正在寻找一种避免使用“ num”的方法。 有任何想法吗?
如果您要使用colName
而不是文字$"num"
,则方法如下:
import org.apache.spark.sql.functions._
df.select(toIntUdf(col(colName))).collect
您可以通过这种方式选择列。 您可以在Spark的DataFrame中找到更多文档
df.select(toIntUdf(df(colName)))
或者:
df.select(toIntUdf(df.col(colName)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.