Show my code
In [10]: rdd = sc.mongoPairRDD("mongodb://localhost/stackoverflow.stack")
......
A lot of INFO
......
In [11]: newrdd = rdd.flatMap(f)
# No INFO
In [12]: newrdd.collect()
# A lot of INFO
When a function of rdd
was call, say flatMap
, it seems the system doesn't run the code of the function. But when, say call collect()
, the system runs and collect all the data from memory or disk?
Am I right?
Yes you are! It is actually the expected behavior for Spark. There are transformations (eg map, flatMap, reduce) and actions (count, collect, saveAsTextFile) that you can apply to an RDD.
As you noted, when you call a transformation, no computation happen, it just stacks the operation to the RDD to get some kind of recipe to produce it. But as soon as you call an action then boom, the RDD is actually evaluated. This is what happens when you call collect.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.