when I am using spark streaming ,I don't reallyunderstand transform operation,here is my code:
val conf = new SparkConf().setAppName("streaming").setMaster("local[4]")
val ssc = new StreamingContext(conf, Seconds(1))
val mDstream =
ssc
.socketTextStream(args(0), 9999).flatMap(x => x.split(" "))
.map((_, 1))
.reduceByKeyAndWindow((a: Int, b: Int) => (a + b), Seconds(10), Seconds(3))
.transform(rdd => {
rdd.sortBy(_._2, false)
})
I want to Know how many RDDs in the mDstream? appreciate that!
transform
is a method which runs on the driver side, that is how it is able to take in an RDD
as its input parameter. Note that the sort will still run in parallel foreach partition inside the RDD
. There will be a single RDD
in a single job running your streaming job.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.