简体   繁体   中英

Hadoop MapReduce: using MapWritable as a key

I want to pass a Map<String, String> from my Mapper to the Reducer.

So the tuple I want is: <(Sorted)MapWritable,IntWritable>

Currently, this is dodged by the poor man's serialization: I create a simple Text object using Guava -s MapJoiner and MapSplitter classes, which produce String which then can be used to initialize the Text object to write. So I am transferring the key-value pairs in a String, which is then Splitted back.

But I want to drop this hack.

I know that if mapred.output.key.comparator is not set, then the used key class must implement WritableComparable . The problem is, MapWritable and SortedMapWritable lacks this interface.

I checked the WritableComparable interface, but I'm a bit confused, because you have to re-invent the write-read methods (serialization), not just compareTo() .

So my question: can you help me finding a WORKING example, code, guideline or any valuable info? Thanks in advance.

You can extends MapWritable (or SortedMapWritable ) and implement WritableComparable . You do not need to rewrite write-read methods since MapWritable (or SortedMapWritable ) has done it for you. For example,

public class MyMapWritable extends MapWritable implements
        WritableComparable<MyMapWritable> {

    @Override
    public int compareTo(MyMapWritable o) {
        // Implement your compare logic
        return 0;
    }
}

public class MySortedWritable extends SortedMapWritable implements
        WritableComparable<MySortedWritable> {

    @Override
    public int compareTo(MySortedWritable o) {
        // Implement your compare logic
        return 0;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM