简体   繁体   中英

How do I import and use a class in Mapper in Hadoop?

I have a class PorterStemmer which I would like to use in my Mapper . My Driver class consists of Mapper, Reducer too. I tried putting the PorterStemmer class in the Driver class, but Hadoop showed ClassNotFoundException during runtime. I also tried putting the PorterStemmer in a JAR and added it to distributed cache but then obviously I got compiler time error as PorterStemmer wasn't present inside Driver class. Is there anyway I can get around this problem?

Here is my Driver class

public class InvertedIndex {

public static class IndexMapper extends Mapper<Object, Text, Text, Text>{
    private Text word = new Text();
    private Text filename = new Text();
    private boolean caseSensitive = false;
    public static PorterStemmer stemmer = new PorterStemmer();

    String token;
    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException { 
        String filenameStr = ((FileSplit) context.getInputSplit()).getPath().getName();
        filename = new Text(filenameStr);

        String line = value.toString();

        if (!caseSensitive) {
            line = line.toLowerCase();
        }

        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            token = tokenizer.nextToken();

            stemmer.add(token.toCharArray(), token.length());
            stemmer.stem();
            token =stemmer.toString();

            word.set(token);
            context.write(word, filename);
        }
    }
}

public static class IndexReducer extends Reducer<Text,Text,Text,Text> {


    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        StringBuilder stringBuilder = new StringBuilder();

        for (Text value : values) {
            stringBuilder.append(value.toString());

            if (values.iterator().hasNext()) {
                stringBuilder.append(" -> ");
            }
        }

        context.write(key, new Text(stringBuilder.toString()));
    }
}

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();


    Job job = Job.getInstance(conf, "inverted index");

    job.addCacheFile(new Path("/invertedindex/lib/stemmer.jar").toUri());

    job.setJarByClass(InvertedIndex.class);

    /* Field separator for reducer output*/
    job.getConfiguration().set("mapreduce.output.textoutputformat.separator", " | ");

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.setMapperClass(IndexMapper.class);
    job.setCombinerClass(IndexReducer.class);
    job.setReducerClass(IndexReducer.class); 

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);


    Path inputFilePath = new Path(args[0]);
    Path outputFilePath = new Path(args[1]);
    FileInputFormat.addInputPath(job, inputFilePath);
    FileOutputFormat.setOutputPath(job, outputFilePath);

    /* Delete output filepath if already exists */
    FileSystem fs = FileSystem.newInstance(conf);

    if (fs.exists(outputFilePath)) {
        fs.delete(outputFilePath, true);
    }

    System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Either you build a fat jar with all the dependencies or share the jar to nodes using below process

You need to use -libjars to make the jar you are using distributed to all nodes. Then this new jar would be added to classpath of the task node and picked up by either mapper or reducer

hadoop jar yourJar.jar com.JobClass -libjars /path/of/stemmer.jar

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM