[英]Scala Spark, how to add value to the column
My goal is to add a configurable constant value to a given column of a DataFrame. 我的目标是将可配置的常量值添加到DataFrame的给定列。
val df = Seq(("A", 1), ("B", 2), ("C", 3)).toDF("col1", "col2")
+----+----+
|col1|col2|
+----+----+
| A| 1|
| B| 2|
| C| 3|
+----+----+
To do so, I can define a UDF with a hard-coded number, as the following: 为此,我可以使用硬编码定义UDF,如下所示:
val add100 = udf( (x: Int) => x + 100)
df.withColumn("col3", add100($"col2")).show()
+----+----+----+
|col1|col2|col3|
+----+----+----+
| A| 1| 101|
| B| 2| 102|
| C| 3| 103|
+----+----+----+
My question is, what's the best way to make the number (100 above) configurable? 我的问题是,使数字(100以上)可配置的最佳方法是什么?
I have tried the following way and it seems to work. 我尝试了以下方式,似乎工作。 But I was wondering is there any other better way to achieve the same operational result? 但我想知道是否还有其他更好的方法来实现相同的运营结果?
val addP = udf( (x: Int, p: Int) => x + p )
df.withColumn("col4", addP($"col2", lit(100)))
+----+----+----+
|col1|col2|col4|
+----+----+----+
| A| 1| 101|
| B| 2| 102|
| C| 3| 103|
+----+----+----+
We don't need an udf here: 我们这里不需要udf:
df.withColumn("col3", df("col2") + 100).show
+----+----+----+
|col1|col2|col3|
+----+----+----+
| A| 1| 101|
| B| 2| 102|
| C| 3| 103|
+----+----+----+
You may define a curried function , pull extra parameters out and return a udf that takes only columns as parameters: 您可以定义一个curried函数 ,拉出额外的参数并返回一个仅将列作为参数的udf :
val addP = (p: Int) => udf( (x: Int) => x + p )
// addP: Int => org.apache.spark.sql.expressions.UserDefinedFunction = <function1>
df.withColumn("col3", addP(100)($"col2")).show
+----+----+----+
|col1|col2|col3|
+----+----+----+
| A| 1| 101|
| B| 2| 102|
| C| 3| 103|
+----+----+----+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.