如何在Hadoop的Mapper中导入和使用类？

Question

我有一个想在Mapper中使用的PorterStemmer类。 我的驱动程序类也包括Mapper和Reducer。 我尝试将PorterStemmer类放入Driver类，但Hadoop在运行时显示ClassNotFoundException 。 我还尝试将PorterStemmer放入JAR中，并将其添加到分布式缓存中，但是显然由于PorterStemmer在Driver类中不存在，我遇到了编译器错误。 无论如何，我可以解决这个问题吗？

这是我的司机课

public class InvertedIndex {

public static class IndexMapper extends Mapper<Object, Text, Text, Text>{
    private Text word = new Text();
    private Text filename = new Text();
    private boolean caseSensitive = false;
    public static PorterStemmer stemmer = new PorterStemmer();

    String token;
    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException { 
        String filenameStr = ((FileSplit) context.getInputSplit()).getPath().getName();
        filename = new Text(filenameStr);

        String line = value.toString();

        if (!caseSensitive) {
            line = line.toLowerCase();
        }

        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            token = tokenizer.nextToken();

            stemmer.add(token.toCharArray(), token.length());
            stemmer.stem();
            token =stemmer.toString();

            word.set(token);
            context.write(word, filename);
        }
    }
}

public static class IndexReducer extends Reducer<Text,Text,Text,Text> {


    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        StringBuilder stringBuilder = new StringBuilder();

        for (Text value : values) {
            stringBuilder.append(value.toString());

            if (values.iterator().hasNext()) {
                stringBuilder.append(" -> ");
            }
        }

        context.write(key, new Text(stringBuilder.toString()));
    }
}

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();


    Job job = Job.getInstance(conf, "inverted index");

    job.addCacheFile(new Path("/invertedindex/lib/stemmer.jar").toUri());

    job.setJarByClass(InvertedIndex.class);

    /* Field separator for reducer output*/
    job.getConfiguration().set("mapreduce.output.textoutputformat.separator", " | ");

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.setMapperClass(IndexMapper.class);
    job.setCombinerClass(IndexReducer.class);
    job.setReducerClass(IndexReducer.class); 

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);


    Path inputFilePath = new Path(args[0]);
    Path outputFilePath = new Path(args[1]);
    FileInputFormat.addInputPath(job, inputFilePath);
    FileOutputFormat.setOutputPath(job, outputFilePath);

    /* Delete output filepath if already exists */
    FileSystem fs = FileSystem.newInstance(conf);

    if (fs.exists(outputFilePath)) {
        fs.delete(outputFilePath, true);
    }

    System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Answer 1

您可以使用以下所有程序来构建一个带有所有依赖项的胖jar或使用以下过程将jar共享给节点

您需要使用-libjars将要使用的jar分发到所有节点。 然后，将这个新的jar添加到任务节点的类路径中，并由mapper或reducer拾取

hadoop jar yourJar.jar com.JobClass -libjars /path/of/stemmer.jar

如何在Hadoop的Mapper中导入和使用类？

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-11-06 03:50:32

如何在Hadoop的Mapper中导入和使用类？

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-11-06 03:50:32

解决方案1
0 已采纳 2015-11-06 03:50:32