I am trying to convert a StringBuilder object to a RDD[String] and I am having some trouble. I am able to get the StringBuilder object into a RDD[Char], but I need it to be a RDD[String]. When it writes out to the file system as RDD[Char], it puts 1 character on per line. This is not acceptable. I am using Spark 1.2 with Java 7. My code below
val sc = new SparkContext
val sb:StringBuilder = new StringBuilder();
sb.append("#").append("\n");
sb.append("# Version 1").append("\n");
val headerFile = sc.parallelize(sb, 1)
headerFile.saveAsTextFile(path)
sc.stop
Any ideas on how to convert sb into RDD[String]?
parallelize
expects a Seq
. When you pass in a String (or StringBuilder), it will view the String as a Seq[Char]
.
You have to create the Seq of Strings yourself. For example if you want one String per line, simply use sc.parallelize(Seq("#", "# Version 1"))
.
To reduce it to a single output file, use headerFile.coalesce(1).saveAsTextFile(path)
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.