简体   繁体   English

Scala Spark,如何为列添加值

[英]Scala Spark, how to add value to the column

My goal is to add a configurable constant value to a given column of a DataFrame. 我的目标是将可配置的常量值添加到DataFrame的给定列。

val df = Seq(("A", 1), ("B", 2), ("C", 3)).toDF("col1", "col2")

+----+----+
|col1|col2|
+----+----+
|   A|   1|
|   B|   2|
|   C|   3|
+----+----+

To do so, I can define a UDF with a hard-coded number, as the following: 为此,我可以使用硬编码定义UDF,如下所示:

val add100 = udf( (x: Int) => x + 100)
df.withColumn("col3", add100($"col2")).show()

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   A|   1| 101|
|   B|   2| 102|
|   C|   3| 103|
+----+----+----+    

My question is, what's the best way to make the number (100 above) configurable? 我的问题是,使数字(100以上)可配置的最佳方法是什么?

I have tried the following way and it seems to work. 我尝试了以下方式,似乎工作。 But I was wondering is there any other better way to achieve the same operational result? 但我想知道是否还有其他更好的方法来实现相同的运营结果?

val addP = udf( (x: Int, p: Int) => x + p )
df.withColumn("col4", addP($"col2", lit(100)))

+----+----+----+
|col1|col2|col4|
+----+----+----+
|   A|   1| 101|
|   B|   2| 102|
|   C|   3| 103|
+----+----+----+

We don't need an udf here: 我们这里不需要udf:

df.withColumn("col3", df("col2") + 100).show
+----+----+----+
|col1|col2|col3|
+----+----+----+
|   A|   1| 101|
|   B|   2| 102|
|   C|   3| 103|
+----+----+----+

You may define a curried function , pull extra parameters out and return a udf that takes only columns as parameters: 您可以定义一个curried函数 ,拉出额外的参数并返回一个仅将列作为参数的udf

val addP = (p: Int) => udf( (x: Int) => x + p ) 
// addP: Int => org.apache.spark.sql.expressions.UserDefinedFunction = <function1>

df.withColumn("col3", addP(100)($"col2")).show
+----+----+----+
|col1|col2|col3|
+----+----+----+
|   A|   1| 101|
|   B|   2| 102|
|   C|   3| 103|
+----+----+----+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我们如何使用 Scala 在 spark 中添加列值? - How do we add column value in spark using Scala? Spark,在Scala中添加具有相同值的新列 - Spark, add new Column with the same value in Scala 将具有文字值的新列添加到 Spark Scala 中 Dataframe 中的结构列 - Add new column with literal value to a struct column in Dataframe in Spark Scala 如何基于Spark Scala中的现有列添加新列 - How add new column based on existing column in spark scala 如何将唯一的 id 列添加到 DataFrame、Apache Spark、Scala - How to add a unique id column to a DataFrame, Apache Spark, Scala 如何进行 groupby 排名并将其作为列添加到 spark scala 中的现有 dataframe? - How to do a groupby rank and add it as a column to existing dataframe in spark scala? Scala Spark中的值和列操作,如何在Spark列中使用运算符剩余的值? - Value and column operations in scala spark, how to use a value left of an operator with spark column? 如何在.withColumn函数中获取列的Integer值? [Spark-Scala] - How to get the Integer value of a column in .withColumn function? [Spark - Scala] Scala/Spark:如何为列列表执行过滤器和更改列值? - Scala/Spark : How to perform filter and change value of a column for a list of columns? 如何在火花 scala dataframe 中更新嵌套列的 xml 的值 - how to update nested column's value of xml in spark scala dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM