简体   繁体   English

无法将 csv 中的记录映射到 Scala/Spark 中的类的对象

[英]Not able to map records in csv into objects of a class in Scala / Spark

I have a jupyter notebook running a spylon-kernel (Scala / Spark).我有一个运行 spylon 内核(Scala/Spark)的 jupyter 笔记本。

Currently, I try to load records from a csv into a RDD and then map each record into objects of the "Weather" class as follows:目前,我尝试将记录从 csv 加载到 RDD 中,然后将每个记录映射到“天气”类的对象中,如下所示:

val lines = scala.io.Source.fromFile("/path/to/nycweather.csv").mkString
println(lines)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

//Next, you need to import a library for creating a SchemaRDD. Type this:
import sqlContext.implicits._

//Create a case class in Scala that defines the schema of the table. Type in:
case class Weather(date: String, temp: Int, precipitation: Double)

//Create the RDD of the Weather object:
val weather = sc.textFile("/path/to/nycweather.csv").map(_.split(",")). map(w => Weather(w(0), w(1).trim.toInt, w(2).trim.toDouble)).toDF()

//It all works fine until the last line above.
//But when I run this line of code:
weather.first()

It all bursts out with the following error message这一切都爆发出以下错误消息

在weather.first() 之后获得的错误

the message has a couple more lines but I omitted to be more visible.该消息还有几行,但我省略了更明显的内容。

Could someone indicate why am I getting this error and suggest code changes to solve it?有人可以指出为什么我会收到此错误并建议更改代码以解决它吗?

You are using older RDD syntax for reading a CSV.您正在使用旧的 RDD 语法来读取 CSV。 There is an easier way to read a CSV as有一种更简单的方法来读取 CSV 为

    val weather1 = spark.read.csv("path to nycweather.csv").toDF("date","temp","precipitation")
    weather1.show()

Input file contains following data输入文件包含以下数据

1/1/2010,30,35.0
2/4/2015,35,27.9

Result结果

+--------+----+-------------+
|    date|temp|precipitation|
+--------+----+-------------+
|1/1/2010|  30|         35.0|
|2/4/2015|  35|         27.9|
+--------+----+-------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM