简体   繁体   中英

Hadoop Custom Partitioner not behaving according to the logic

Based on this example here , this works. Have tried the same on my dataset.

Sample Dataset:

OBSERVATION;2474472;137176;
OBSERVATION;2474473;137176;
OBSERVATION;2474474;137176;
OBSERVATION;2474475;137177;

Consider each line as string, my Mapper output is:

key-> string[2], value-> string.

My Partitioner code:

@Override
public int getPartition(Text key, Text value, int reducersDefined) {

    String keyStr = key.toString();
    if(keyStr == "137176") {
        return 0;
    } else {
        return 1 % reducersDefined;
    }
}

In my data set most id's are 137176. Reducer declared -2. I expect two output files, one for 137176 and second for remaining Id's. I'm getting two output files but, Id's evenly distributed on both the output files. What's going wrong in my program?

  1. Explicitly set in the Driver method that you want to use your custom Partitioner, by using: job.setPartitionerClass(YourPartitioner.class); . If you don't do that, the default HashPartitioner is used.

  2. Change String comparison method from == to .equals() . ie, change if(keyStr == "137176") { to if(keyStr.equals("137176")) { .
    To save some time, perhaps it will be faster to declare a new Text variable at the beginning of the partitioner, like that: Text KEY = new Text("137176"); and then, without converting your input key to String every time, just compare it with the KEY variable (again using the equals() method). But perhaps those are equivalent. So, what I suggest is:

    \nText KEY = new Text("137176"); \n\n@Override \npublic int getPartition(Text key, Text value, int reducersDefined) { \n    return key.equals(KEY) ?  0 : 1 % reducersDefined;     \n} \n

Another suggestion, if the network load is heavy, parse the map output key as VIntWritable and change the Partitioner accordingly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM