简体   繁体   中英

during HBase scan with MapReduce, the number of Reducer is always one

I do HBase scan in Mapper, then Reducer writes result to HDFS.
The number of records output by mapper is roughly 1,000,000,000.

The problem is the number of reducers is always one, though I have set -Dmapred.reduce.tasks=100 . The reduce process is very slow.

// edit at 2016-12-04 by 祝方泽
the code of my main class:

public class GetUrlNotSent2SpiderFromHbase extends Configured implements Tool {

public int run(String[] arg0) throws Exception {

    Configuration conf = getConf();
    Job job = new Job(conf, conf.get("mapred.job.name"));
    String input_table = conf.get("input.table");       

    job.setJarByClass(GetUrlNotSent2SpiderFromHbase.class);

    Scan scan = new Scan();
    scan.setCaching(500);
    scan.setCacheBlocks(false);
    scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("sitemap_type"));
    scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("is_send_to_spider"));

    TableMapReduceUtil.initTableMapperJob(
            input_table, 
            scan, 
            GetUrlNotSent2SpiderFromHbaseMapper.class, 
            Text.class, 
            Text.class, 
            job);

    /*job.setMapperClass(GetUrlNotSent2SpiderFromHbaseMapper.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);*/

    job.setReducerClass(GetUrlNotSent2SpiderFromHbaseReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    if (job.waitForCompletion(true) && job.isSuccessful()) {
        return 0;
    }
    return -1;
}

public static void main(String[] args) throws Exception {
    Configuration conf = HBaseConfiguration.create();
    int res = ToolRunner.run(conf, new GetUrlNotSent2SpiderFromHbase(), args);
    System.exit(res);
}

}

here is the script to run this MapReduce job:

table="xxx"
output="yyy"
sitemap_type="zzz"

JOBCONF=""
JOBCONF="${JOBCONF} -Dmapred.job.name=test_for_scan_hbase"
JOBCONF="${JOBCONF} -Dinput.table=$table"
JOBCONF="${JOBCONF} -Dmapred.output.dir=$output"
JOBCONF="${JOBCONF} -Ddemand.sitemap.type=$sitemap_type"
JOBCONF="${JOBCONF} -Dyarn.app.mapreduce.am.command-opts='-Xmx8192m'"
JOBCONF="${JOBCONF} -Dyarn.app.mapreduce.am.resource.mb=9216"
JOBCONF="${JOBCONF} -Dmapreduce.map.java.opts='-Xmx1536m'"
JOBCONF="${JOBCONF} -Dmapreduce.map.memory.mb=2048"
JOBCONF="${JOBCONF} -Dmapreduce.reduce.java.opts='-Xmx1536m'"
JOBCONF="${JOBCONF} -Dmapreduce.reduce.memory.mb=2048"
JOBCONF="${JOBCONF} -Dmapred.reduce.tasks=100"
JOBCONF="${JOBCONF} -Dmapred.job.priority=VERY_HIGH"

hadoop fs -rmr $output
hadoop jar get_url_not_sent_2_spider_from_hbase_hourly.jar hourly.GetUrlNotSent2SpiderFromHbase $JOBCONF
echo "===== scan HBase finished ====="

I set job.setNumReduceTasks(100); in code, it worked.

Since you mentioned only one reduce is working that's the obvious reason why reducer is very slow.

Unified way to know configuration properties applied to Job (this you call for every job you execute to know parameters are passed correctly) :

add the below method to your job driver mentioned above to print configuration entries applied from all possible sources ie either from -D or some where else please add this method call in driver program before your job is submitted :

public static void printConfigApplied(Configuration conf) 
     try {
                conf.writeXml(System.out);
            } catch (final IOException e) {
                e.printStackTrace();
            }
}

This proves your system properties are not applied from the command line ie -Dxxx so the way you are passing system properties is not correct. since pro grammatically.

Since job.setnumreducetasks is working , I strongly suspect the below where your system properties are not passed correctly to driver.

 Configuration conf = getConf();
    Job job = new Job(conf, conf.get("mapred.job.name"));

change this to the example in this

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM