Spark Scala：在同一行的数据框中创建和使用列

Question

When I need a new column in a dataframe to then use in a different computation, my code looks similar to: 当我需要数据框中的新列然后用于其他计算时，我的代码类似于：

var df: DataFrame = ...
df = df.withColumn("new_col", df.col("a") / 2)
println(df.withColumn("res", df.col("b") + df.col("new_col")).head())

How to combine into a single line (and avoid using var )? 如何合并成一行（并避免使用var ）？

The problem is df.col() as I cannot simply do the following because new_col does not exist in df yet: 问题是df.col()因为我不能简单地执行以下操作，因为df尚不存在new_col ：

df.withColumn("new_col", df.col("a"))
  .withColumn("res", df.col("b") + df.col("new_col"))
  .head()

Is there some API I am missing? 我缺少一些API吗？

Answer 1

You can use $ to make a column instead of df.col ; 您可以使用$代替df.col来创建列； The former will infer the column from the new data frame instead of df : 前者将根据新数据框而不是df推断该列：

df.withColumn("new_col", $"a")
  .withColumn("res", $"b" + $"new_col")
  .head()

Or: 要么：

import org.apache.spark.sql.functions.col
df.withColumn("new_col", col("a"))
  .withColumn("res", col("b") + col("new_col"))
  .head()

Spark Scala：在同一行的数据框中创建和使用列

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-10-15 14:55:00

Spark Scala：在同一行的数据框中创建和使用列

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-10-15 14:55:00

解决方案1
2 已采纳 2018-10-15 14:55:00