简体   繁体   中英

Not able to map records in csv into objects of a class in Scala / Spark

I have a jupyter notebook running a spylon-kernel (Scala / Spark).

Currently, I try to load records from a csv into a RDD and then map each record into objects of the "Weather" class as follows:

val lines = scala.io.Source.fromFile("/path/to/nycweather.csv").mkString
println(lines)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

//Next, you need to import a library for creating a SchemaRDD. Type this:
import sqlContext.implicits._

//Create a case class in Scala that defines the schema of the table. Type in:
case class Weather(date: String, temp: Int, precipitation: Double)

//Create the RDD of the Weather object:
val weather = sc.textFile("/path/to/nycweather.csv").map(_.split(",")). map(w => Weather(w(0), w(1).trim.toInt, w(2).trim.toDouble)).toDF()

//It all works fine until the last line above.
//But when I run this line of code:
weather.first()

It all bursts out with the following error message

在weather.first() 之后获得的错误

the message has a couple more lines but I omitted to be more visible.

Could someone indicate why am I getting this error and suggest code changes to solve it?

You are using older RDD syntax for reading a CSV. There is an easier way to read a CSV as

    val weather1 = spark.read.csv("path to nycweather.csv").toDF("date","temp","precipitation")
    weather1.show()

Input file contains following data

1/1/2010,30,35.0
2/4/2015,35,27.9

Result

+--------+----+-------------+
|    date|temp|precipitation|
+--------+----+-------------+
|1/1/2010|  30|         35.0|
|2/4/2015|  35|         27.9|
+--------+----+-------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM