简体   繁体   English

Hadoop Mapreduce字数统计程序

[英]Hadoop Mapreduce word count Program

I am new to Hadoop. 我是Hadoop的新手。 Below is my code. 下面是我的代码。 I am getting the following error message when i run the Jar. 运行Jar时出现以下错误消息。

Input file (wordcount.txt) => this file is stored in "/home/cloudera/SK_JAR/jsonFile/wordcount.txt" path 输入文件(wordcount.txt)=>此文件存储在“ /home/cloudera/SK_JAR/jsonFile/wordcount.txt”路径中

Hello Hadoop, Goodbye Hadoop. 您好Hadoop,再见Hadoop。

   package com.main;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCount {

    public static class Map extends Mapper {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
            }
        }
     } 

     public static class Reduce extends Reducer {

        public void reduce(Text key, Iterable<IntWritable> values, Context context) 
          throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            context.write(key, new IntWritable(sum));
        }
     }


    public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException {
        // TODO Auto-generated method stub

Configuration conf = new Configuration();


        Job job = new Job(conf, "wordcount");

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));


//      job.waitForCompletion(true);
        System.exit(job.waitForCompletion(true) ?  0 : 1);


    }

}

Following is the error message. 以下是错误消息。 Can some one please help me on this? 有人可以帮我吗?

let me know if you guys need more details.. 让我知道你们是否需要更多细节。

hadoop jar Wordcount.jar WordCount '/home/cloudera/SK_JAR/jsonFile/wordcount.txt'  output

Error in laoding args file.java.io.FileNotFoundException: WordCount (No such file or directory)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:146)
    at java.io.FileInputStream.<init>(FileInputStream.java:101)
    at com.main.mainClass.main(mainClass.java:28)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

hadoop jar Wordcount.jar WordCount

Your main class is part of a package com.main therefore com.main.WordCount is needed to start your class. 您的主类是com.main包的一部分,因此需要com.main.WordCount来启动您的类。

You can open your JAR file as a ZIP file to verify if you can find com/main/WordCount$Map.class within it. 您可以将JAR文件作为ZIP文件打开,以验证是否可以在其中找到com/main/WordCount$Map.class If it is not there, then Eclipse is building your JAR wrong. 如果不存在,则说明Eclipse生成的JAR错误。

I might suggest you learn Maven, Gradle, or SBT to build a JAR rather than your IDE. 我可能建议您学习Maven,Gradle或SBT来构建JAR而不是IDE。 In production, these are the tools commonly used to bundle Hadoop JAR files. 在生产中,这些是通常用于捆绑Hadoop JAR文件的工具。

It seems main class file was not mentioned with your runnable jar file. 似乎您的可运行jar文件未提及主类文件。

If you are using Eclipse to create jar file then follow below steps to create Wordcount.jar. 如果使用Eclipse创建jar文件,请按照以下步骤创建Wordcount.jar。

Right click on Project -> Export JAR file -> Click Next - > Uncheck all other resources. 右键单击Project-> Export JAR文件->单击Next->取消选中所有其他资源。 Then provide path for exporting .jar file -> Click Next -> Keep options selected -> Click Next and Finish. 然后提供导出.jar文件的路径->单击下一步->保持选项处于选中状态->单击下一步并完成。 reference 参考

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM