简体   繁体   中英

What is difference between transformations and rdd functions in spark?

I am reading spark textbooks and I see that transformations and actions and again I read rdd functions , so I am confuse, can anyone explain what is the basic difference between transformations and spark rdd functions.

Both are used to change the rdd data contents and return a new rdd but I want to know the precise explantion.

Spark rdd functions are transformations and actions both. Transformation is function that changes rdd data and Action is a function that doesn't change the data but gives an output.
For example :
map , filter , union etc are all transformation as they help in changing the existing data. reduce , collect , count are all action as they give output and not change data. for more info visit Spark and Jacek

RDDs support only two types of operations: transformations, which create a new dataset from an existing one, and actions, which return a value to the driver program after running a computation on the dataset.

RDD Functions is a generic term used in textbook for internal mechanism.

For example, MAP is a transformation that passes each dataset element through a function and returns a new RDD representing the results. REDUCE is an action that aggregates all the elements of the RDD using some function and returns the final result to the driver program.

Since Spark's collections are immutable in nature, we can't change the data once the RDD is created. Transformations are function that apply to RDDs and produce other RDDs in output (ie: map , flatMap , filter , join , groupBy , ...). Actions are the functions that apply to RDDs and produce non-RDD (Array,List...etc) data as output (ie: count , saveAsText , foreach , collect , ...).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM