简体   繁体   中英

Scala not able to save as sequence file in RDD, as per doc it is allowed

I am using Spark 1.6, as per the official doc it is allowed to save a RDD to sequence file format, however I notice for my RDD textFile:

scala> textFile.saveAsSequenceFile("products_sequence")
<console>:30: error: value saveAsSequenceFile is not a member of org.apache.spark.rdd.RDD[String]

I googled and found similar discussions seem to suggest this works in pyspark. Is my understanding to the official doc wrong? Can saveAsSequenceFile() be used in Scala?

The saveAsSequenceFile is only available when you have key value pairs in the RDD. The reason for this is that it is defined in PairRDDFunctions

https://spark.apache.org/docs/2.1.1/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

You can see that the API definition takes a K and a V.

if you change your code above to

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.rdd._

object SequeneFile extends App {
   val conf = new SparkConf().setAppName("sequenceFile").setMaster("local[1]")
   val sc = new SparkContext(conf)
   val rdd : RDD[(String, String)] = sc.parallelize(List(("foo", "foo1"), ("bar", "bar1"), ("baz", "baz1")))
   rdd.saveAsSequenceFile("foo.seq")
   sc.stop()
}

This works perfectly and you will get foo.seq file. The reason why the above works is because we have an RDD which is a key value pair and not just a RDD[String].

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM