简体   繁体   中英

How to provide a subclass in Mapper and Reducer of Hadoop?

I have a sub (child) class that extends from a super (parent) class. I want a way to provide a general type for the input value of the Mapper, so that I can provide both the child and parent as valid values like this:

public static class MyMapper extends Mapper<..., MyParentClass , ..., ...>

I want MyChildClass,which extends from MyParentClass, to be valid also.

However when I am running the program if the value is a child class I am getting an exception:

type mismatch in value from map: expected MyParentClass, recieved MyChildClass

How can I enable both the child and the parent classes to be a valid input/output value to/from the mapper?

Update:

package hipi.examples.dumphib;

import hipi.image.FloatImage;
import hipi.image.ImageHeader;
import hipi.imagebundle.mapreduce.ImageBundleInputFormat;
import hipi.util.ByteUtils;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import java.io.IOException;
import java.util.Iterator;

public class DumpHib extends Configured implements Tool {

  public static class DumpHibMapper extends Mapper<ImageHeader, FloatImage, IntWritable, Text> {

    @Override
    public void map(ImageHeader key, FloatImage value, Context context) throws IOException, InterruptedException  {

      int imageWidth = value.getWidth();
      int imageHeight = value.getHeight();

      String outputStr = null;

      if (key == null) {
    outputStr = "Failed to read image header.";
      } else if (value == null) {
    outputStr = "Failed to decode image data.";
      } else {
    String camera = key.getEXIFInformation("Model");
    String hexHash = ByteUtils.asHex(ByteUtils.FloatArraytoByteArray(value.getData()));
    outputStr = imageWidth + "x" + imageHeight + "\t(" + hexHash + ")\t  " + camera;
      }

      context.write(new IntWritable(1), new Text(outputStr));
    }

  }

  public static class DumpHibReducer extends Reducer<IntWritable, Text, IntWritable, Text> {

    @Override
    public void reduce(IntWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
      for (Text value : values) {
    context.write(key, value);
      }
    }

  }

  public int run(String[] args) throws Exception {

    if (args.length < 2) {
      System.out.println("Usage: dumphib <input HIB> <output directory>");
      System.exit(0);
    }

    Configuration conf = new Configuration();

    Job job = Job.getInstance(conf, "dumphib");

    job.setJarByClass(DumpHib.class);
    job.setMapperClass(DumpHibMapper.class);
    job.setReducerClass(DumpHibReducer.class);

    job.setInputFormatClass(ImageBundleInputFormat.class);
    job.setOutputKeyClass(IntWritable.class);
    job.setOutputValueClass(Text.class);

    String inputPath = args[0];
    String outputPath = args[1];

    removeDir(outputPath, conf);

    FileInputFormat.setInputPaths(job, new Path(inputPath));
    FileOutputFormat.setOutputPath(job, new Path(outputPath));

    job.setNumReduceTasks(1);

    return job.waitForCompletion(true) ? 0 : 1;

  }

  private static void removeDir(String path, Configuration conf) throws IOException {
    Path output_path = new Path(path);
    FileSystem fs = FileSystem.get(conf);
    if (fs.exists(output_path)) {
      fs.delete(output_path, true);
    }
  }

  public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new DumpHib(), args);
    System.exit(res);
  }

}

FloatImage is a super class and I have ChildFloatImage class that extends from it. When ChildFloatImage is returned from RecordReader it is throwing the previous exception.

Background

The reason for this is that type erasure makes it impossible for Java to (at runtime) check that your MyMapper actually extends the correct type (in terms of the generic type parameters on Mapper ).

Java basically compiles:

List<String> list = new ArrayList<String>();
list.add("Hi");
String x = list.get(0);

into

List list = new ArrayList();
list.add("Hi");
String x = (String) list.get(0);

Credits for this example go here .

So you are inputting MyMapper where Java wants to see Mapper<A, B, C, D> of specific A , B , C and D - not possible at runtime. So we have to force that check at compile time.

Solution

You can do the following for all your custom subclasses:

job.setMapperClass(DumpHibMapper.class);

using java.lang.Class#asSubclass

and doing this instead:

job.setMapperClass(DumpHibMapper.class.asSubclass(Mapper.class));

The solution, I followed, is to create a container/wrapper class that delegates all the required functions to the origional object as follows:

public class FloatImageContainer implements Writable, RawComparator<BinaryComparable> {

    private FloatImage floatImage;

    public FloatImage getFloatImage() {
        return floatImage;
    }

    public void setFloatImage(FloatImage floatImage) {
        this.floatImage = floatImage;
    }

    public FloatImageContainer() {
        this.floatImage = new FloatImage();
    }

    public FloatImageContainer(FloatImage floatImage) {
        this.floatImage = floatImage;
    }

    @Override
    public int compare(BinaryComparable o1, BinaryComparable o2) {
        // TODO Auto-generated method stub
        return floatImage.compare(o1, o2);
    }

    @Override
    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
        // TODO Auto-generated method stub
        return floatImage.compare(b1, s1, l1, b2, s2, l2);
    }

    @Override
    public void write(DataOutput out) throws IOException {
        // TODO Auto-generated method stub
        floatImage.write(out);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        // TODO Auto-generated method stub
        floatImage.readFields(in);
    }

}

And in the Mapper:

public static class MyMapper extends Mapper<..., FloatImageContainer, ..., ...> {

In this case both the FloatImage and ChildFloatImage can be encapsulated in FloatImageContainer and you get rid of inheretance problems in Hadoop, because there is only one class used directly FloatImageContainer which is not parent/child of any.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM