简体   繁体   中英

Does hadoop mapreduce open temporary files in hdfs

When a map-reduce job runs, it must be creating a lot of temporary files for storing results of various mappers and reducers. Are those temporary files written to hdfs.

If yes, the namenode's editlog could become huge in a short time given that it records each and every transaction like file open, close etc. Can that be avoided by directly writing to the native filesystem instead of hdfs or is that a bad idea?

Intermediate result of map reduce code has been written to local file system not hdfs and automatically it gets removed after completion of job.

I mean to say output from mapper has been written to local file system, specific location can be configured but by default it writes into /tmp/hadoop-username* location

You mean to say, the temporary files are created each time when the Mapper runs. If yes, then you can't avoid this because Mapper's output are written to disk rather than in-memory. The TaskTracker would take care of creating setup for MR job and creating temporary disk space for Mapper intermediate output. Also the temporary space would be cleaned by TaskTracker once MR job completed.

This is one of bottleneck of MR programming paradigm.

Any comments/feedback would be appreciated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM