简体   繁体   English

将StringBuilder转换为RDD [String]

[英]Convert StringBuilder to RDD[String]

I am trying to convert a StringBuilder object to a RDD[String] and I am having some trouble. 我试图将StringBuilder对象转换为RDD [String],但遇到了一些麻烦。 I am able to get the StringBuilder object into a RDD[Char], but I need it to be a RDD[String]. 我可以将StringBuilder对象放入RDD [Char]中,但我需要它成为RDD [String]。 When it writes out to the file system as RDD[Char], it puts 1 character on per line. 当它以RDD [Char]的形式写到文件系统时,每行放置1个字符。 This is not acceptable. 这是不可接受的。 I am using Spark 1.2 with Java 7. My code below 我在Java 7中使用Spark 1.2。下面的代码

val sc = new SparkContext
val sb:StringBuilder = new StringBuilder();
    sb.append("#").append("\n");
    sb.append("# Version 1").append("\n");
val headerFile = sc.parallelize(sb, 1)
headerFile.saveAsTextFile(path)
sc.stop

Any ideas on how to convert sb into RDD[String]? 关于如何将sb转换为RDD [String]的任何想法?

parallelize expects a Seq . parallelize期望一个Seq When you pass in a String (or StringBuilder), it will view the String as a Seq[Char] . 当您传入一个String(或StringBuilder)时,它将把该String视为Seq[Char]

You have to create the Seq of Strings yourself. 您必须自己创建String Seq。 For example if you want one String per line, simply use sc.parallelize(Seq("#", "# Version 1")) . 例如,如果您希望每行一个String,则只需使用sc.parallelize(Seq("#", "# Version 1"))

To reduce it to a single output file, use headerFile.coalesce(1).saveAsTextFile(path) . 要将其缩减为单个输出文件,请使用headerFile.coalesce(1).saveAsTextFile(path)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM