简体   繁体   English

运行hadoop wordcount.java示例-错误

[英]running hadoop wordcount.java example - error

i'm trying to run the wordcount example but getting errors: java.io.IOException: hdfs:///wordcount_data/input/Document1.txt not a SequenceFile or java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text 我正在尝试运行wordcount示例,但出现错误:java.io.IOException:hdfs:///wordcount_data/input/Document1.txt不是SequenceFile或java.lang.ClassCastException:org.apache.hadoop.io.LongWritable无法转换为org.apache.hadoop.io.Text

depending on what i have in: job.setInputFormatClass(SequenceFileInputFormat.class); 取决于我在:job.setInputFormatClass(SequenceFileInputFormat.class); job.setOutputFormatClass(SequenceFileOutputFormat.class); job.setOutputFormatClass(SequenceFileOutputFormat.class); or job.setInputFormatClass(TextInputFormat.class); 或job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(SequenceFileOutputFormat.class); job.setOutputFormatClass(SequenceFileOutputFormat.class);

My input is just a simple txt file having nothing else in it than "Hello World Welcome" 我的输入只是一个简单的txt文件,除了“ Hello World Welcome”外没有其他内容

Can you please let me know what I can be doing wrong Here's the code: 您能告诉我我该怎么做吗?这是代码:

import java.io.IOException;
import java.util.Iterator;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

/**
 * Word count example for Hadoop Map Reduce.
 * 
 * Adapted from the {@link http://wiki.apache.org/hadoop/WordCount Hadoop wiki}.
 */
public class WordCount {

    /** Mapper for word count.
     *
     * The base class Mapper is parameterized by
     * <in key type, in value type, out key type, out value type>.
     *
     * Thus, this mapper takes (Text key, Text value) pairs and outputs
     * (Text key, LongWritable value) pairs. The input keys are assumed
     * to be identifiers for documents, which are ignored, and the values
     * to be the content of documents. The output keys are words found
     * within each document, and the output values are the number of times
     * a word appeared within a document.
     *
     * To support efficient serialization (conversion of data to and from
     * formats suitable for transport), Hadoop typically does not use the
     * built-in Java classes like "String" and "Long" as key or value types. The
     * wrappers Text and LongWritable implement Hadoop's serialization
     * interface (called Writable) and, unlike Java's String and Long, are
     * mutable.
     */
    public static class WordCountMap extends Mapper<Text, Text, Text, LongWritable> {
        /** Regex pattern to find words (alphanumeric + _). */
        final static Pattern WORD_PATTERN = Pattern.compile("\\w+");

        /** Constant 1 as a LongWritable value. */
        private final static LongWritable ONE = new LongWritable(1L);

        /** Text object to store a word to write to output. */
        private Text word = new Text();

        /** Actual map function. Takes one document's text and emits key-value
         * pairs for each word found in the document.
         *
         * @param key Document identifier (ignored).
         * @param value Text of the current document.
         * @param context MapperContext object for accessing output, 
         *                configuration information, etc.
         */
        public void map(Text key, Text value, Context context)
                throws IOException, InterruptedException {
            Matcher matcher = WORD_PATTERN.matcher(value.toString());


            while (matcher.find()) {

                word.set(matcher.group());
                context.write(word, ONE);
            }

        }
    }

    /** Reducer for word count.
     *
     * Like the Mapper base class, the base class Reducer is parameterized by 
     * <in key type, in value type, out key type, out value type>.
     *
     * For each Text key, which represents a word, this reducer gets a list of
     * LongWritable values, computes the sum of those values, and the key-value
     * pair (word, sum).
     */
    public static class SumReduce extends Reducer<Text, LongWritable, Text, LongWritable> {
        /** Actual reduce function.
         * 
         * @param key Word.
         * @param values Iterator over the values for this key.
         * @param context ReducerContext object for accessing output,
         *                configuration information, etc.
         */
        public void reduce(Text key, Iterator<LongWritable> values,
                Context context) throws IOException, InterruptedException {
            long sum = 0L;
            while (values.hasNext()) {
                sum += values.next().get();
            }
            context.write(key, new LongWritable(sum));
        }
    }

    /** Entry-point for our program. Constructs a Job object representing a single
     * Map-Reduce job and asks Hadoop to run it. When running on a cluster, the
     * final "waitForCompletion" call will distribute the code for this job across
     * the cluster.
     *
     * @param rawArgs command-line arguments
     */
    public static void main(String[] rawArgs) throws Exception {
        /* Use Hadoop's GenericOptionsParser, so our MapReduce program can accept
         * common Hadoop options.
         */
        GenericOptionsParser parser = new GenericOptionsParser(rawArgs);
        Configuration conf = parser.getConfiguration();
        String[] args = parser.getRemainingArgs();

        /* Create an object to represent a Job. */
        Job job = new Job(conf, "wordcount");

        /* Tell Hadoop where to locate the code that must be shipped if this
         * job is to be run across a cluster. Unless the location of code
         * is specified in some other way (e.g. the -libjars command line
         * option), all non-Hadoop code required to run this job must be
         * contained in the JAR containing the specified class (WordCountMap 
         * in this case).
         */
        job.setJarByClass(WordCountMap.class);

        /* Set the datatypes of the keys and values outputted by the maps and reduces.
         * These must agree with the types used by the Mapper and Reducer. Mismatches
         * will not be caught until runtime.
         */
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);

        /* Set the mapper and reducer to use. These reference the classes defined above. */
        job.setMapperClass(WordCountMap.class);
        job.setReducerClass(SumReduce.class);

        /* Set the format to expect input in and write output in. The input files we have
         * provided are in Hadoop's "sequence file" format, which allows for keys and
         * values of arbitrary Hadoop-supported types and supports compression.
         *
         * The output format TextOutputFormat outputs each key-value pair as a line
         * "key<tab>value".
         */
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(SequenceFileOutputFormat.class);

        /* Specify the input and output locations to use for this job. */
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        /* Submit the job and wait for it to finish. The argument specifies whether
         * to print progress information to output. (true means to do so.)
         */

        job.waitForCompletion(true);
    }

If you are using the older MapReduce API then do this : 如果您使用的是较旧的MapReduce API,请执行以下操作:

  conf.setMapOutputKeyClass(Text.class); 
  conf.setMapOutputValueClass(IntWritable.class); 

If you are using the new MapReduce API then do this : 如果您使用的是新的MapReduce API,请执行以下操作:

job.setMapOutputKeyClass(Text.class);     
job.setMapOutputValueClass(IntWritable.class);

REASON : The reason for this is that your MapReduce application might be using TextInputFormat as the InputFormat class and this class generates keys of type LongWritable and values of type Text by default. 原因 :原因是您的MapReduce应用程序可能使用TextInputFormat作为InputFormat类,并且该类默认情况下会生成LongWritable类型的LongWritable和Text类型的值。 But your application might be expecting keys of type Text. 但是您的应用程序可能期望使用Text类型的键。 That's why you get this error. 这就是为什么您会收到此错误。

Source: ClouFront blog 资料来源: ClouFront博客

Change Text to LongWritable in your Mapper class. 在Mapper类中将 Text更改为LongWritable You are using TextInputFormat as your InputFormat which generates keys of types LongWritable , but you have specified Text in your Mapper. 您正在使用TextInputFormat作为InputFormat,它生成LongWritable类型的 ,但是您已经在Mapper中指定了Text。

HTH 高温超导

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM