Hadoop MapReduce: using MapWritable as a key

Question

I want to pass a Map<String, String> from my Mapper to the Reducer.

So the tuple I want is: <(Sorted)MapWritable,IntWritable>

Currently, this is dodged by the poor man's serialization: I create a simple Text object using Guava -s MapJoiner and MapSplitter classes, which produce String which then can be used to initialize the Text object to write. So I am transferring the key-value pairs in a String, which is then Splitted back.

But I want to drop this hack.

I know that if mapred.output.key.comparator is not set, then the used key class must implement WritableComparable . The problem is, MapWritable and SortedMapWritable lacks this interface.

I checked the WritableComparable interface, but I'm a bit confused, because you have to re-invent the write-read methods (serialization), not just compareTo() .

So my question: can you help me finding a WORKING example, code, guideline or any valuable info? Thanks in advance.

Answer 1

You can extends MapWritable (or SortedMapWritable ) and implement WritableComparable . You do not need to rewrite write-read methods since MapWritable (or SortedMapWritable ) has done it for you. For example,

public class MyMapWritable extends MapWritable implements
        WritableComparable<MyMapWritable> {

    @Override
    public int compareTo(MyMapWritable o) {
        // Implement your compare logic
        return 0;
    }
}

public class MySortedWritable extends SortedMapWritable implements
        WritableComparable<MySortedWritable> {

    @Override
    public int compareTo(MySortedWritable o) {
        // Implement your compare logic
        return 0;
    }
}

Hadoop MapReduce: using MapWritable as a key

Question

1 answers

solution1
1 ACCPTED 2013-09-05 10:51:50

Hadoop MapReduce: using MapWritable as a key

Question

1 answers

solution1 1 ACCPTED 2013-09-05 10:51:50

solution1
1 ACCPTED 2013-09-05 10:51:50