[英]Not able to map records in csv into objects of a class in Scala / Spark
I have a jupyter notebook running a spylon-kernel (Scala / Spark).我有一个运行 spylon 内核(Scala/Spark)的 jupyter 笔记本。
Currently, I try to load records from a csv into a RDD and then map each record into objects of the "Weather" class as follows:目前,我尝试将记录从 csv 加载到 RDD 中,然后将每个记录映射到“天气”类的对象中,如下所示:
val lines = scala.io.Source.fromFile("/path/to/nycweather.csv").mkString
println(lines)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
//Next, you need to import a library for creating a SchemaRDD. Type this:
import sqlContext.implicits._
//Create a case class in Scala that defines the schema of the table. Type in:
case class Weather(date: String, temp: Int, precipitation: Double)
//Create the RDD of the Weather object:
val weather = sc.textFile("/path/to/nycweather.csv").map(_.split(",")). map(w => Weather(w(0), w(1).trim.toInt, w(2).trim.toDouble)).toDF()
//It all works fine until the last line above.
//But when I run this line of code:
weather.first()
It all bursts out with the following error message这一切都爆发出以下错误消息
the message has a couple more lines but I omitted to be more visible.该消息还有几行,但我省略了更明显的内容。
Could someone indicate why am I getting this error and suggest code changes to solve it?有人可以指出为什么我会收到此错误并建议更改代码以解决它吗?
You are using older RDD syntax for reading a CSV.您正在使用旧的 RDD 语法来读取 CSV。 There is an easier way to read a CSV as有一种更简单的方法来读取 CSV 为
val weather1 = spark.read.csv("path to nycweather.csv").toDF("date","temp","precipitation")
weather1.show()
Input file contains following data输入文件包含以下数据
1/1/2010,30,35.0
2/4/2015,35,27.9
Result结果
+--------+----+-------------+
| date|temp|precipitation|
+--------+----+-------------+
|1/1/2010| 30| 35.0|
|2/4/2015| 35| 27.9|
+--------+----+-------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.