繁体   English   中英

如何在Hadoop的Mapper中导入和使用类?

[英]How do I import and use a class in Mapper in Hadoop?

我有一个想在Mapper中使用的PorterStemmer类。 我的驱动程序类也包括Mapper和Reducer。 我尝试将PorterStemmer类放入Driver类,但Hadoop在运行时显示ClassNotFoundException 我还尝试将PorterStemmer放入JAR中,并将其添加到分布式缓存中,但是显然由于PorterStemmer在Driver类中不存在,我遇到了编译器错误。 无论如何,我可以解决这个问题吗?

这是我的司机课

public class InvertedIndex {

public static class IndexMapper extends Mapper<Object, Text, Text, Text>{
    private Text word = new Text();
    private Text filename = new Text();
    private boolean caseSensitive = false;
    public static PorterStemmer stemmer = new PorterStemmer();

    String token;
    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException { 
        String filenameStr = ((FileSplit) context.getInputSplit()).getPath().getName();
        filename = new Text(filenameStr);

        String line = value.toString();

        if (!caseSensitive) {
            line = line.toLowerCase();
        }

        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            token = tokenizer.nextToken();

            stemmer.add(token.toCharArray(), token.length());
            stemmer.stem();
            token =stemmer.toString();

            word.set(token);
            context.write(word, filename);
        }
    }
}

public static class IndexReducer extends Reducer<Text,Text,Text,Text> {


    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        StringBuilder stringBuilder = new StringBuilder();

        for (Text value : values) {
            stringBuilder.append(value.toString());

            if (values.iterator().hasNext()) {
                stringBuilder.append(" -> ");
            }
        }

        context.write(key, new Text(stringBuilder.toString()));
    }
}

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();


    Job job = Job.getInstance(conf, "inverted index");

    job.addCacheFile(new Path("/invertedindex/lib/stemmer.jar").toUri());

    job.setJarByClass(InvertedIndex.class);

    /* Field separator for reducer output*/
    job.getConfiguration().set("mapreduce.output.textoutputformat.separator", " | ");

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.setMapperClass(IndexMapper.class);
    job.setCombinerClass(IndexReducer.class);
    job.setReducerClass(IndexReducer.class); 

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);


    Path inputFilePath = new Path(args[0]);
    Path outputFilePath = new Path(args[1]);
    FileInputFormat.addInputPath(job, inputFilePath);
    FileOutputFormat.setOutputPath(job, outputFilePath);

    /* Delete output filepath if already exists */
    FileSystem fs = FileSystem.newInstance(conf);

    if (fs.exists(outputFilePath)) {
        fs.delete(outputFilePath, true);
    }

    System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

您可以使用以下所有程序来构建一个带有所有依赖项的胖jar或使用以下过程将jar共享给节点

您需要使用-libjars将要使用的jar分发到所有节点。 然后,将这个新的jar添加到任务节点的类路径中,并由mapper或reducer拾取

hadoop jar yourJar.jar com.JobClass -libjars /path/of/stemmer.jar

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM