简体繁体中英

Hadoop mapper task detailed execution time

原文 2013-11-21 16:16:53 3 1 hadoop

For a certain Hadoop MapReduce mapper task, I have already had the mapper task's complete execution time. In general, a mapper has three steps: (1)read input from HDFS or other source like Amazon S3; (2)process input data; (3)write intermediate result to local disk. Now, I am wondering if it's possible to know the time spent by each step.

My purpose is to get the result of (1) how long does it take for mappers to read input from HDFS or S3. The result just indicate how fast a mapper could read. It's more like a I/O performance for a mapper; (2) how long does it take for the mapper to process these data, it's more like the computing capability of the task.

Anyone has any idea for how to acquire these results?

Thanks.

1 answers

Just implement a read-only mapper that does not emit anything. This will then give an indication of how long it takes for each split to be read (but not processed).

You can as a further step define a variable passed to the job at runtime (via the job properties) which allows you to do just one of the following (by eg parsing the variable against an Enum object and then switching on the values):

just read
just read and process (but not write/emit anything)
do it all

This of course assumes that you have access to the mapper code.

Mapper progress in Hadoop for running task

Hadoop speculative task execution

Time spent by a Hadoop MapReduce mapper task to read input files from HDFS or S3

Elapsed Time for a Hadoop Task

“Text file busy” error for the mapper in a Hadoop streaming job execution

How do I get each reduce task's execution time programmatically in Hadoop?

Error in Mapper Task in Hadoop 2.2 using MultilineJSON format

How to get Hadoop task tracker info from inside a mapper or reducer

Hadoop: force 1 mapper task per node from jobconf

hadoop orc table taking only one mapper all the time

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Mapper progress in Hadoop for running task Hadoop speculative task execution Time spent by a Hadoop MapReduce mapper task to read input files from HDFS or S3 Elapsed Time for a Hadoop Task “Text file busy” error for the mapper in a Hadoop streaming job execution How do I get each reduce task's execution time programmatically in Hadoop? Error in Mapper Task in Hadoop 2.2 using MultilineJSON format How to get Hadoop task tracker info from inside a mapper or reducer Hadoop: force 1 mapper task per node from jobconf hadoop orc table taking only one mapper all the time

Related Tags

Hadoop mapper task detailed execution time

Question

1 answers

solution1 0 ACCPTED 2013-11-21 16:18:53

solution1
0 ACCPTED 2013-11-21 16:18:53