简体   繁体   中英

save rdd of array of array to text file spark

I have a RDD called tmp like this.

"org.apache.spark.rdd.RDD[(String, List[(String, String, Double)])]" 

and the values are like as below.

Array[(String, List[(String, String, Double)])] = Array((1076486,List((1076486,1076486,0.0), (1076486,431000,0.7438727490345501), (1076486,351632,3.139055446043724), (1076486,431611,6.173095256463185))), (430067,List((430067,430067,0.0), (430067,1037380,4.0390818750047535), (430067,431611,6.396930255172381), (430067,824889,7.265222659014164))))

and my output should be the inner contents of the list like below...

1076486,1076486,0.0
1076486,431000,0.7438727490345501
.
.
430067,1037380,4.0390818750047535

I tried this..

.mapValues(_.toList).saveAsTextFile

It appears as below in the file.

(1076486,List((1076486,1076486,0.0), (1076486,431000,0.7438727490345501), (1076486,351632,3.139055446043724), (1076486,431611,6.173095256463185)))
(430067,List((430067,430067,0.0), (430067,1037380,4.0390818750047535), (430067,431611,6.396930255172381), (430067,824889,7.265222659014164)))

I could print the desired data by below code

tmp.collect().foreach(a=> {a.foreach(e=>print(e+" "))})

But cannot save it to the file.

How can I get the desired result?

只需手动创建输出字符串:

tmp.values.flatMap(_.map{case (x, y, z) => s"$x,$y,$z"})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM