Spark-scala：withColumn 不是 Unit 的成員

Question

我正在嘗試使用 spark df 在 spark 中讀取 CSV 文件。 該文件沒有 header 列，但我想要 header 列。 怎么做？ 不知道對不對，我寫了這個命令-> val df = spark.read.format("csv").load("/path/genchan1.txt").show()

並將列名作為 _c0 和 _c1 作為列。 然后我嘗試使用以下方法將列名更改為所需的名稱： val df1 = df.withColumnRenamed("_c0","Series") ，但我得到“withColumnRenamed”不是單元上的成員。

PS：我已經導入了 spark.implicits._ 和 spark.sql.functions 。

請幫助我知道是否有任何方法可以將列 header 添加到數據集以及為什么我會遇到這個問題。

Answer 1

show的返回類型是Unit 。 請從最后刪除show 。

val df = spark.read.format("csv").load("/path/genchan1.txt")
df.show()

然后您可以使用所有 df 功能-

val df1 = df.withColumnRenamed("_c0","Series")

Answer 2

如果您事先知道 CSV 文件的結構，那么在加載數據時定義一個模式並將其附加到 df 是一個更好的解決方案。

快速參考的示例代碼 -

import org.apache.spark.sql.types._

val customSchema = StructType(Array(
  StructField("Series", StringType, true),
  StructField("Column2", StringType, true),
  StructField("Column3", IntegerType, true),
  StructField("Column4", DoubleType, true))
)

val df = spark.read.format("csv")
.option("header", "false") #since your file does not have header
.schema(customSchema)
.load("/path/genchan1.txt")

df.show()

Spark-scala：withColumn 不是 Unit 的成員

問題描述

2 個解決方案

解決方案1
2 已采納 2020-06-25 08:44:36

解決方案2
1 2020-06-25 12:09:58

Spark-scala：withColumn 不是 Unit 的成員

問題描述

2 個解決方案

解決方案1 2 已采納 2020-06-25 08:44:36

解決方案2 1 2020-06-25 12:09:58

解決方案1
2 已采納 2020-06-25 08:44:36

解決方案2
1 2020-06-25 12:09:58