简体   繁体   中英

Hadoop mapper task detailed execution time

For a certain Hadoop MapReduce mapper task, I have already had the mapper task's complete execution time. In general, a mapper has three steps: (1)read input from HDFS or other source like Amazon S3; (2)process input data; (3)write intermediate result to local disk. Now, I am wondering if it's possible to know the time spent by each step.

My purpose is to get the result of (1) how long does it take for mappers to read input from HDFS or S3. The result just indicate how fast a mapper could read. It's more like a I/O performance for a mapper; (2) how long does it take for the mapper to process these data, it's more like the computing capability of the task.

Anyone has any idea for how to acquire these results?

Thanks.

Just implement a read-only mapper that does not emit anything. This will then give an indication of how long it takes for each split to be read (but not processed).

You can as a further step define a variable passed to the job at runtime (via the job properties) which allows you to do just one of the following (by eg parsing the variable against an Enum object and then switching on the values):

  • just read
  • just read and process (but not write/emit anything)
  • do it all

This of course assumes that you have access to the mapper code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM