如何通过servlet提交WordCount.jar到hadoop

Question

I now have a WordCount.jar stored in the linux local file system and a file containing a set of words stored in HDFS . 我现在有一个WordCount.jar存储在linux本地文件系统中，并且一个文件包含一组存储在HDFS中的单词。 How can i run this WordCount.jar through a servlet and specify the input and output paths in the servlet. 我如何通过servlet运行WordCount.jar并指定servlet中的输入和输出路径。

    package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length < 2) {
      System.err.println("Usage: wordcount <in> [<in>...] <out>");
      System.exit(2);
    }
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    for (int i = 0; i < otherArgs.length - 1; ++i) {
      FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
    }
    FileOutputFormat.setOutputPath(job,
      new Path(otherArgs[otherArgs.length - 1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Answer 1

Generally following steps 通常遵循以下步骤

Put your wordcount class jar file to your server classpath as well as necessary hadoop client jar files 将您的wordcount类jar文件以及必要的hadoop客户端jar文件放入服务器类路径
Specify the input and output dir as http request argument 将输入和输出目录指定为http请求参数
Parse the dirs from http request in your servlet doGet() method 在servlet doGet（）方法中解析来自http请求的目录
Use JobClient to submit your job 使用JobClient提交工作

如何通过servlet提交WordCount.jar到hadoop

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-03-08 03:50:21

如何通过servlet提交WordCount.jar到hadoop

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-03-08 03:50:21

解决方案1
0 已采纳 2017-03-08 03:50:21