将双RDD保存到文件-Scala

Question

I am trying to save in a file a double RDD, what I mean by a double RDD is that I have this variable: 我正在尝试在文件中保存一个双RDD，一个双RDD的意思是我有这个变量：

res: org.apache.spark.rdd.RDD[org.apache.spark.rdd.RDD[((String,String), Int)]] = MapPartitionsRDD[19]

I tried to store it with 我试图用

res.saveAsTextFile(path)

But it doesn't work, an exception is launched because Spark does not support nested RDD here is a sample of the code: 但这是行不通的，因为Spark不支持嵌套的RDD，所以启动了一个异常，这里是代码示例：

val res = Listword.map { x =>
Listword.map { y =>
  ((x._1, y._1), x._2 + y._2)
}
}
res.saveAsTextFile("C:/Users/Administrator/Documents/spark/spark-1.6.0-bin-hadoop2.6")

Answer 1

Spark does not allow nested RDDs. Spark不允许嵌套的RDD。 In your specific case, you can use cartesian : 在您的特定情况下，可以使用cartesian ：

ListWord.cartesian(ListWord).map { case (x, y) =>
  ((x._1, y._1), x._2 + y._2)
}

将双RDD保存到文件-Scala

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-03-18 21:18:12

将双RDD保存到文件-Scala

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-03-18 21:18:12

解决方案1
3 已采纳 2016-03-18 21:18:12