I have a jupyter notebook running a spylon-kernel (Scala / Spark).
Currently, I try to load records from a csv into a RDD and then map each record into objects of the "Weather" class as follows:
val lines = scala.io.Source.fromFile("/path/to/nycweather.csv").mkString
println(lines)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
//Next, you need to import a library for creating a SchemaRDD. Type this:
import sqlContext.implicits._
//Create a case class in Scala that defines the schema of the table. Type in:
case class Weather(date: String, temp: Int, precipitation: Double)
//Create the RDD of the Weather object:
val weather = sc.textFile("/path/to/nycweather.csv").map(_.split(",")). map(w => Weather(w(0), w(1).trim.toInt, w(2).trim.toDouble)).toDF()
//It all works fine until the last line above.
//But when I run this line of code:
weather.first()
It all bursts out with the following error message
the message has a couple more lines but I omitted to be more visible.
Could someone indicate why am I getting this error and suggest code changes to solve it?
You are using older RDD syntax for reading a CSV. There is an easier way to read a CSV as
val weather1 = spark.read.csv("path to nycweather.csv").toDF("date","temp","precipitation")
weather1.show()
Input file contains following data
1/1/2010,30,35.0
2/4/2015,35,27.9
Result
+--------+----+-------------+
| date|temp|precipitation|
+--------+----+-------------+
|1/1/2010| 30| 35.0|
|2/4/2015| 35| 27.9|
+--------+----+-------------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.