无法将 csv 中的记录映射到 Scala/Spark 中的类的对象

Question

I have a jupyter notebook running a spylon-kernel (Scala / Spark).我有一个运行 spylon 内核（Scala/Spark）的 jupyter 笔记本。

Currently, I try to load records from a csv into a RDD and then map each record into objects of the "Weather" class as follows:目前，我尝试将记录从 csv 加载到 RDD 中，然后将每个记录映射到“天气”类的对象中，如下所示：

val lines = scala.io.Source.fromFile("/path/to/nycweather.csv").mkString
println(lines)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

//Next, you need to import a library for creating a SchemaRDD. Type this:
import sqlContext.implicits._

//Create a case class in Scala that defines the schema of the table. Type in:
case class Weather(date: String, temp: Int, precipitation: Double)

//Create the RDD of the Weather object:
val weather = sc.textFile("/path/to/nycweather.csv").map(_.split(",")). map(w => Weather(w(0), w(1).trim.toInt, w(2).trim.toDouble)).toDF()

//It all works fine until the last line above.
//But when I run this line of code:
weather.first()

It all bursts out with the following error message这一切都爆发出以下错误消息

the message has a couple more lines but I omitted to be more visible.该消息还有几行，但我省略了更明显的内容。

Could someone indicate why am I getting this error and suggest code changes to solve it?有人可以指出为什么我会收到此错误并建议更改代码以解决它吗？

Answer 1

You are using older RDD syntax for reading a CSV.您正在使用旧的 RDD 语法来读取 CSV。 There is an easier way to read a CSV as有一种更简单的方法来读取 CSV 为

    val weather1 = spark.read.csv("path to nycweather.csv").toDF("date","temp","precipitation")
    weather1.show()

Input file contains following data输入文件包含以下数据

1/1/2010,30,35.0
2/4/2015,35,27.9

Result结果

+--------+----+-------------+
|    date|temp|precipitation|
+--------+----+-------------+
|1/1/2010|  30|         35.0|
|2/4/2015|  35|         27.9|
+--------+----+-------------+

无法将 csv 中的记录映射到 Scala/Spark 中的类的对象

问题描述

1 个解决方案

解决方案1
1 2020-01-16 20:57:09

无法将 csv 中的记录映射到 Scala/Spark 中的类的对象

问题描述

1 个解决方案

解决方案1 1 2020-01-16 20:57:09

解决方案1
1 2020-01-16 20:57:09