简体   繁体   English

将scala字符串转换为RDD [seq [string]]

[英]convert scala string to RDD[seq[string]]

 // 4 workers
  val sc = new SparkContext("local[4]", "naivebayes")

  // Load documents (one per line).
  val documents: RDD[Seq[String]] = sc.textFile("/tmp/test.txt").map(_.split(" ").toSeq)

  documents.zipWithIndex.foreach{
  case (e, i) =>
  val collectedResult = Tokenizer.tokenize(e.mkString)
  }

  val hashingTF = new HashingTF()
  //pass collectedResult instead of document
  val tf: RDD[Vector] = hashingTF.transform(documents)

  tf.cache()
  val idf = new IDF().fit(tf)
  val tfidf: RDD[Vector] = idf.transform(tf)

in the above code snippet, i would want to extract collectedResult to reuse it for hashingTF.transform, How can this be achieved where the signature of tokenize function is 在上面的代码片段中,我想提取collectedResult以将其重用于hashingTF.transform,在tokenize函数的签名为

 def tokenize(content: String): Seq[String] = {
...
}

Looks like you want map rather than foreach . 看起来您想要map而不是foreach I don't understand what you're using zipWithIndex for, nor why you're calling split on your lines only to join them straight back up again with mkString . 我不明白您在使用zipWithIndex做什么,也不明白为什么zipWithIndex调用split只是为了通过mkString直接将它们重新连接起来。

val lines: Rdd[String] = sc.textFile("/tmp/test.txt")
val tokenizedLines = lines.map(tokenize)
val hashes = tokenizedLines.map(hashingTF)
hashes.cache()
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Scala 中有没有办法将 Seq[(String, String)] 转换为 Seq[String]? - Is there a way in Scala to convert Seq[(String, String)] To Seq[String]? 在 scala 中将 RDD[Array[(String,String)]] 类型转换为 RDD[(String,String)] - Convert RDD[Array[(String,String)]] type to RDD[(String,String)] in scala 将RDD Array [Any] = Array(List([String],ListBuffer([string]))转换为RDD(String,Seq [String]) - convert RDD Array[Any] = Array(List([String], ListBuffer([string])) to RDD(String, Seq[String]) 将字符串的 scala Seq 转换为键值对 - Convert a scala Seq of string to key value pair Scala 将 Map[String, Any] 中的 ArrayList 转换为 Seq - Scala convert ArrayList in Map[String, Any] to Seq Scala:如何将Seq [Array [String]]转换为Seq [Double]? - Scala: How to convert a Seq[Array[String]] into Seq[Double]? Scala-在Spark RDD中将字符串转换为日期 - Scala - Convert String to Date in Spark RDD Scala将Seq [Object]转换为Map [String,Map [String,String]] - Scala convert Seq[Object] to Map[String, Map[String, String]] 在 Spark Scala 中将 RDD[(String, String, String)] 转换为 RDD[(String, (String, String))] - Convert RDD[(String, String, String)] to RDD[(String, (String, String))] in Spark Scala 将RDD [String]转换为RDD [Row]转换为Dataframe Spark Scala - Convert RDD[String] to RDD[Row] to Dataframe Spark Scala
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM