简体   繁体   中英

Reading Java properties files in Hadoop MapReduce applications

I was wondering what is the standard practice for reading Java properties files in MapReduce applications and how to pass the location to it when submitting (starting) a job. In regular Java applications you can pass the location to the properties file as a JVM system property (-D) or argument to main method. What is the best alternative (standard practice) for this for MapReduce jobs? Some good examples would be very helpful.

The best alternative is to use DistributedCache , however it may not be the standard way. There can be other ways. But I haven't seen any code using anything else so far.

The idea is to add the file to the cache, and read it inside setup method of map/reduce and load values into a Properties or a Map . If you need snippet I can add.

Oh I can remember, my friend JtheRocker used another approach. He set entire contents of the file against a key in the Configuration object, got it's value on setup then parsing & loading the pairs in a Map . In this case, file reading is done on the driver, which was previously on the task's side. While it's suitable for small files and seems cleaner, orthodox people may not like to pollute conf at all.

I would like to see, what other posts bring out.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM