简体   繁体   中英

Convert StringBuilder to RDD[String]

I am trying to convert a StringBuilder object to a RDD[String] and I am having some trouble. I am able to get the StringBuilder object into a RDD[Char], but I need it to be a RDD[String]. When it writes out to the file system as RDD[Char], it puts 1 character on per line. This is not acceptable. I am using Spark 1.2 with Java 7. My code below

val sc = new SparkContext
val sb:StringBuilder = new StringBuilder();
    sb.append("#").append("\n");
    sb.append("# Version 1").append("\n");
val headerFile = sc.parallelize(sb, 1)
headerFile.saveAsTextFile(path)
sc.stop

Any ideas on how to convert sb into RDD[String]?

parallelize expects a Seq . When you pass in a String (or StringBuilder), it will view the String as a Seq[Char] .

You have to create the Seq of Strings yourself. For example if you want one String per line, simply use sc.parallelize(Seq("#", "# Version 1")) .

To reduce it to a single output file, use headerFile.coalesce(1).saveAsTextFile(path) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM