無法將 csv 中的記錄映射到 Scala/Spark 中的類的對象

Question

我有一個運行 spylon 內核（Scala/Spark）的 jupyter 筆記本。

目前，我嘗試將記錄從 csv 加載到 RDD 中，然后將每個記錄映射到“天氣”類的對象中，如下所示：

val lines = scala.io.Source.fromFile("/path/to/nycweather.csv").mkString
println(lines)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

//Next, you need to import a library for creating a SchemaRDD. Type this:
import sqlContext.implicits._

//Create a case class in Scala that defines the schema of the table. Type in:
case class Weather(date: String, temp: Int, precipitation: Double)

//Create the RDD of the Weather object:
val weather = sc.textFile("/path/to/nycweather.csv").map(_.split(",")). map(w => Weather(w(0), w(1).trim.toInt, w(2).trim.toDouble)).toDF()

//It all works fine until the last line above.
//But when I run this line of code:
weather.first()

這一切都爆發出以下錯誤消息

該消息還有幾行，但我省略了更明顯的內容。

有人可以指出為什么我會收到此錯誤並建議更改代碼以解決它嗎？

Answer 1

您正在使用舊的 RDD 語法來讀取 CSV。 有一種更簡單的方法來讀取 CSV 為

    val weather1 = spark.read.csv("path to nycweather.csv").toDF("date","temp","precipitation")
    weather1.show()

輸入文件包含以下數據

1/1/2010,30,35.0
2/4/2015,35,27.9

結果

+--------+----+-------------+
|    date|temp|precipitation|
+--------+----+-------------+
|1/1/2010|  30|         35.0|
|2/4/2015|  35|         27.9|
+--------+----+-------------+

無法將 csv 中的記錄映射到 Scala/Spark 中的類的對象

問題描述

1 個解決方案

解決方案1
1 2020-01-16 20:57:09

無法將 csv 中的記錄映射到 Scala/Spark 中的類的對象

問題描述

1 個解決方案

解決方案1 1 2020-01-16 20:57:09

解決方案1
1 2020-01-16 20:57:09