Spark Scala：在同一行的数据框中创建和使用列

Question

当我需要数据框中的新列然后用于其他计算时，我的代码类似于：

var df: DataFrame = ...
df = df.withColumn("new_col", df.col("a") / 2)
println(df.withColumn("res", df.col("b") + df.col("new_col")).head())

如何合并成一行（并避免使用var ）？

问题是df.col()因为我不能简单地执行以下操作，因为df尚不存在new_col ：

df.withColumn("new_col", df.col("a"))
  .withColumn("res", df.col("b") + df.col("new_col"))
  .head()

我缺少一些API吗？

Answer 1

您可以使用$代替df.col来创建列； 前者将根据新数据框而不是df推断该列：

df.withColumn("new_col", $"a")
  .withColumn("res", $"b" + $"new_col")
  .head()

要么：

import org.apache.spark.sql.functions.col
df.withColumn("new_col", col("a"))
  .withColumn("res", col("b") + col("new_col"))
  .head()