[英]Not able to map records in csv into objects of a class in Scala / Spark
我有一個運行 spylon 內核(Scala/Spark)的 jupyter 筆記本。
目前,我嘗試將記錄從 csv 加載到 RDD 中,然后將每個記錄映射到“天氣”類的對象中,如下所示:
val lines = scala.io.Source.fromFile("/path/to/nycweather.csv").mkString
println(lines)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
//Next, you need to import a library for creating a SchemaRDD. Type this:
import sqlContext.implicits._
//Create a case class in Scala that defines the schema of the table. Type in:
case class Weather(date: String, temp: Int, precipitation: Double)
//Create the RDD of the Weather object:
val weather = sc.textFile("/path/to/nycweather.csv").map(_.split(",")). map(w => Weather(w(0), w(1).trim.toInt, w(2).trim.toDouble)).toDF()
//It all works fine until the last line above.
//But when I run this line of code:
weather.first()
這一切都爆發出以下錯誤消息
該消息還有幾行,但我省略了更明顯的內容。
有人可以指出為什么我會收到此錯誤並建議更改代碼以解決它嗎?
您正在使用舊的 RDD 語法來讀取 CSV。 有一種更簡單的方法來讀取 CSV 為
val weather1 = spark.read.csv("path to nycweather.csv").toDF("date","temp","precipitation")
weather1.show()
輸入文件包含以下數據
1/1/2010,30,35.0
2/4/2015,35,27.9
結果
+--------+----+-------------+
| date|temp|precipitation|
+--------+----+-------------+
|1/1/2010| 30| 35.0|
|2/4/2015| 35| 27.9|
+--------+----+-------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.