简体繁体中英

Time spent by a Hadoop MapReduce mapper task to read input files from HDFS or S3

原文 2013-11-21 04:09:36 1 1 hadoop/ mapreduce/ mapper

I am running a Hadoop MapReduce job, getting input files from HDFS or Amazon S3. I am wondering if it's possible to know how long does it take for a mapper task to read file from HDFS or S3 to the mapper. I'd like to know the time just for reading data, not include mapper processing time of those data. The result I am looking for is something like MB/second for a certain mapper task, which indicates how fast the mapper can read from HDFS or S3. It's something like a I/O performance.

Thanks.

1 answers

Maybe you can just use a unit mapper and set the number of reducer to zero . Then the only thing that is done in your simulation is I/O, there will be no sorting and shuffling. Or if you specifically want to focus on reading then you can replace the unit mapper with a function that doesn't write any output. Next I would set mapred.jvm.reuse=-1 , to remove the jvm overhead. It isn't perfect but it is probably the easiest way to have a quick idea. If you want to do it precisely I would consider having a look at implemening your own hadoop counters, but currently I have no expericence with that.

How to import data from aws s3 to HDFS with Hadoop MapReduce

Opening files on HDFS from Hadoop mapreduce job

Hadoop server connection for copying files from HDFS to AWS S3

Hadoop: the Mapper didn't read files from multiple input paths

My Input file is being read twice by the mapper in MapReduce of Hadoop

hadoop copying from hdfs to S3

Does hadoop mapreduce open temporary files in hdfs

Hadoop 2.7: MapReduce task's total time using streaming API

Hadoop's Hive/Pig, HDFS and MapReduce relationship

Hadoop MapReduce: Replicating the data from mapper to reducer

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to import data from aws s3 to HDFS with Hadoop MapReduce Opening files on HDFS from Hadoop mapreduce job Hadoop server connection for copying files from HDFS to AWS S3 Hadoop: the Mapper didn't read files from multiple input paths My Input file is being read twice by the mapper in MapReduce of Hadoop hadoop copying from hdfs to S3 Does hadoop mapreduce open temporary files in hdfs Hadoop 2.7: MapReduce task's total time using streaming API Hadoop's Hive/Pig, HDFS and MapReduce relationship Hadoop MapReduce: Replicating the data from mapper to reducer

Related Tags

Time spent by a Hadoop MapReduce mapper task to read input files from HDFS or S3

Question

1 answers

solution1 0 2013-11-22 08:33:02

solution1
0 2013-11-22 08:33:02