简体   繁体   English

Spark Scala:在同一行的数据框中创建和使用列

[英]Spark Scala: create and use column in the dataframe on the same line

When I need a new column in a dataframe to then use in a different computation, my code looks similar to: 当我需要数据框中的新列然后用于其他计算时,我的代码类似于:

var df: DataFrame = ...
df = df.withColumn("new_col", df.col("a") / 2)
println(df.withColumn("res", df.col("b") + df.col("new_col")).head())

How to combine into a single line (and avoid using var )? 如何合并成一行(并避免使用var )?

The problem is df.col() as I cannot simply do the following because new_col does not exist in df yet: 问题是df.col()因为我不能简单地执行以下操作,因为df尚不存在new_col

df.withColumn("new_col", df.col("a"))
  .withColumn("res", df.col("b") + df.col("new_col"))
  .head()

Is there some API I am missing? 我缺少一些API吗?

You can use $ to make a column instead of df.col ; 您可以使用$代替df.col来创建列; The former will infer the column from the new data frame instead of df : 前者将根据新数据框而不是df推断该列:

df.withColumn("new_col", $"a")
  .withColumn("res", $"b" + $"new_col")
  .head()

Or: 要么:

import org.apache.spark.sql.functions.col
df.withColumn("new_col", col("a"))
  .withColumn("res", col("b") + col("new_col"))
  .head()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM