简体   繁体   中英

is it transform operation make a single RDD in Dstream

when I am using spark streaming ,I don't reallyunderstand transform operation,here is my code:

val conf = new SparkConf().setAppName("streaming").setMaster("local[4]")
val ssc = new StreamingContext(conf, Seconds(1))
val mDstream = 
  ssc
   .socketTextStream(args(0), 9999).flatMap(x => x.split(" "))
   .map((_, 1))
   .reduceByKeyAndWindow((a: Int, b: Int) => (a + b), Seconds(10), Seconds(3))
   .transform(rdd => {
      rdd.sortBy(_._2, false)
    })

I want to Know how many RDDs in the mDstream? appreciate that!

transform is a method which runs on the driver side, that is how it is able to take in an RDD as its input parameter. Note that the sort will still run in parallel foreach partition inside the RDD . There will be a single RDD in a single job running your streaming job.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM