更新 Scala 中的数据框列

Question

Given dataframe with columns A, B, and C, created with a "val x = ", I want to update a column like this:给定包含 A、B 和 C 列的数据框，使用“val x =”创建，我想更新这样的列：

x.withColumn("A", when ($"B" === "apple", "fruit").otherwise(col("C")))

This doesn't actually change x, which I believe is expected.这实际上并没有改变 x，我相信这是意料之中的。 Most people I think would create a new dataframe:我认为大多数人会创建一个新的数据框：

val y = x.withColumn("A", when ($"B" === "apple", "fruit").otherwise(col("B")))

And y has the update. y 有更新。 But how do you change x with creating a new dataframe?但是你如何通过创建一个新的数据框来改变 x 呢？ I realize val x is immutable, but even when I declare "var x", it's the same behavior.我意识到 val x 是不可变的，但即使我声明“var x”，它的行为也是相同的。 It doesn't actually save the change.它实际上并没有保存更改。

Is that the Scala best practice, to always create a new DF?这是 Scala 的最佳实践，总是创建一个新的 DF？

Answer 1

As per Spark Architecture DataFrame is built on top of RDDs which are immutable in nature, Hence Dataframes are also immutable in nature.根据 Spark 架构，DataFrame 建立在本质上不可变的 RDD 之上，因此 Dataframes 本质上也是不可变的。

The withColumn or any other operation for that matter, on DataFrames, will generate a new data frame instead of updating the existing data frame.数据帧上的 withColumn 或任何其他与此相关的操作将生成一个新的数据帧，而不是更新现有的数据帧。

val y = x.withColumn("A", when ($"B" === "apple", "fruit").otherwise(col("B")))

You are just storing the result in val y .您只是将结果存储在val y 。

更新 Scala 中的数据框列

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-02-14 18:33:18

更新 Scala 中的数据框列

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-02-14 18:33:18

解决方案1
1 已采纳 2020-02-14 18:33:18