[英]How do I import and use a class in Mapper in Hadoop?
我有一个想在Mapper中使用的PorterStemmer类。 我的驱动程序类也包括Mapper和Reducer。 我尝试将PorterStemmer类放入Driver类,但Hadoop在运行时显示ClassNotFoundException 。 我还尝试将PorterStemmer放入JAR中,并将其添加到分布式缓存中,但是显然由于PorterStemmer在Driver类中不存在,我遇到了编译器错误。 无论如何,我可以解决这个问题吗?
这是我的司机课
public class InvertedIndex {
public static class IndexMapper extends Mapper<Object, Text, Text, Text>{
private Text word = new Text();
private Text filename = new Text();
private boolean caseSensitive = false;
public static PorterStemmer stemmer = new PorterStemmer();
String token;
@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String filenameStr = ((FileSplit) context.getInputSplit()).getPath().getName();
filename = new Text(filenameStr);
String line = value.toString();
if (!caseSensitive) {
line = line.toLowerCase();
}
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
token = tokenizer.nextToken();
stemmer.add(token.toCharArray(), token.length());
stemmer.stem();
token =stemmer.toString();
word.set(token);
context.write(word, filename);
}
}
}
public static class IndexReducer extends Reducer<Text,Text,Text,Text> {
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
StringBuilder stringBuilder = new StringBuilder();
for (Text value : values) {
stringBuilder.append(value.toString());
if (values.iterator().hasNext()) {
stringBuilder.append(" -> ");
}
}
context.write(key, new Text(stringBuilder.toString()));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "inverted index");
job.addCacheFile(new Path("/invertedindex/lib/stemmer.jar").toUri());
job.setJarByClass(InvertedIndex.class);
/* Field separator for reducer output*/
job.getConfiguration().set("mapreduce.output.textoutputformat.separator", " | ");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(IndexMapper.class);
job.setCombinerClass(IndexReducer.class);
job.setReducerClass(IndexReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path inputFilePath = new Path(args[0]);
Path outputFilePath = new Path(args[1]);
FileInputFormat.addInputPath(job, inputFilePath);
FileOutputFormat.setOutputPath(job, outputFilePath);
/* Delete output filepath if already exists */
FileSystem fs = FileSystem.newInstance(conf);
if (fs.exists(outputFilePath)) {
fs.delete(outputFilePath, true);
}
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
您可以使用以下所有程序来构建一个带有所有依赖项的胖jar或使用以下过程将jar共享给节点
您需要使用-libjars将要使用的jar分发到所有节点。 然后,将这个新的jar添加到任务节点的类路径中,并由mapper或reducer拾取
hadoop jar yourJar.jar com.JobClass -libjars /path/of/stemmer.jar
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.