Spark：Split不是org.apache.spark.sql.Row的成员

Question

Below is my code from Spark 1.6. 下面是我来自Spark 1.6的代码。 I am trying to convert it to Spark 2.3 but I am getting error for using split. 我正在尝试将其转换为Spark 2.3，但是使用split却出现错误。

Spark 1.6 code: Spark 1.6代码：

val file = spark.textFile(args(0))
val mapping = file.map(_.split('/t')).map(a => a(1))
mapping.saveAsTextFile(args(1))

Spark 2.3 code: Spark 2.3代码：

val file = spark.read.text(args(0))
val mapping = file.map(_.split('/t')).map(a => a(1)) //Getting Error Here
mapping.write.text(args(1))

Error Message: 错误信息：

value split is not a member of org.apache.spark.sql.Row

Answer 1

Unlike spark.textFile which returns a RDD , spark.read.text returns a DataFrame which is essentially a RDD[Row] . 与spark.textFile返回RDD ， spark.read.text返回的DataFrame本质上是RDD[Row] 。 You could perform map with a partial function as shown in the following example: 您可以使用部分功能执行map ，如以下示例所示：

// /path/to/textfile:
// a    b   c
// d    e   f

import org.apache.spark.sql.Row

val df = spark.read.text("/path/to/textfile")

df.map{ case Row(s: String) => s.split("\\t") }.map(_(1)).show
// +-----+
// |value|
// +-----+
// |    b|
// |    e|
// +-----+

Spark：Split不是org.apache.spark.sql.Row的成员

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-08-04 14:07:00

Spark：Split不是org.apache.spark.sql.Row的成员

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-08-04 14:07:00

解决方案1
3 已采纳 2019-08-04 14:07:00