[英]Spark: Split is not a member of org.apache.spark.sql.Row
Below is my code from Spark 1.6. 下面是我来自Spark 1.6的代码。 I am trying to convert it to Spark 2.3 but I am getting error for using split. 我正在尝试将其转换为Spark 2.3,但是使用split却出现错误。
Spark 1.6 code: Spark 1.6代码:
val file = spark.textFile(args(0))
val mapping = file.map(_.split('/t')).map(a => a(1))
mapping.saveAsTextFile(args(1))
Spark 2.3 code: Spark 2.3代码:
val file = spark.read.text(args(0))
val mapping = file.map(_.split('/t')).map(a => a(1)) //Getting Error Here
mapping.write.text(args(1))
Error Message: 错误信息:
value split is not a member of org.apache.spark.sql.Row
Unlike spark.textFile
which returns a RDD
, spark.read.text returns a DataFrame
which is essentially a RDD[Row]
. 与spark.textFile
返回RDD
, spark.read.text返回的DataFrame
本质上是RDD[Row]
。 You could perform map
with a partial function as shown in the following example: 您可以使用部分功能执行map
,如以下示例所示:
// /path/to/textfile:
// a b c
// d e f
import org.apache.spark.sql.Row
val df = spark.read.text("/path/to/textfile")
df.map{ case Row(s: String) => s.split("\\t") }.map(_(1)).show
// +-----+
// |value|
// +-----+
// | b|
// | e|
// +-----+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.