简体   繁体   中英

Apache Spark multiple job to access single method

Say I have 5 jobs that wants to access single method that will read this big file and put it to RDD. Instead of reading this file multiple times (because there will be 5 jobs that will do the same method), there's this "mother" class that will check if there already exist a job that already called the method.

Assuming that these 5 jobs are executed in a sequence, then you can read the file and cache it <RDD>.cache(...) in the first job itself and rest all job can check if the file already exists in cache then just use it, else read it again.

for more info, Refer to RDD API .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM