简体   繁体   中英

How does mapper output get written to HDFS in case of Sqoop?

As I have learned about Hadoop Map-Reduce jobs that mapper output is written to local storage and not to HDFS, as it is ultimately a throwaway data and so no point of storing in HDFS.

But as I see in case of Sqoop mapper output file part-m-00000 is written into HDFS. So my doubt is whether there is some setting in Hadoop to control where mapper output gets written to? And it is set to local storage by default?

If there are no reducers then mapper output is written to HDFS. Even in this case mapper output is not directly written to HDFS but written on individual node disk and then copied over to HDFS.

Sqoop is one scenario where it is typically a map only job wherein you want o get data from a table in parallel but you do not need to reduce data on any condition.

Check this link : Identity Reducer vs zero reducer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM