简体   繁体   中英

How do I log from a mapper? (hadoop with commoncrawl)

I'm using the commoncrawl example code from their " Mapreduce for the Masses " tutorial. I'm trying to make modifications to the mapper and I'd like to be able to log strings to some output. I'm considering setting up some noSQL db and just pushing my output to it, but it doesn't feel like a good solution. What's the standard way to do this kind of logging from java?

While there is no special solution for the logs aside of usual logger (at least one I am aware about) I can see about some solutions.
a) if logs are of debug purpose - indeed write usual debug logs. In case of the failed tasks you can find them via UI and analyze.
b) if this logs are some kind of output you want to get alongside some other output from you job - assign them some specail key and write to the context. Then in the reducer you will need some special logic to put them to the output.
c) You can create directory on HDFS and make mapper to write to there. It is not classic way for MR because it is side effect - in some cases it can be fine. Especially taking to account that after each mapper will create its own file - you can use command hadoop fs -getmerge ... to get all logs as one file.
c) If you want to be able to monitor the progress of your job, number of error etc - you can use counters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM