简体   繁体   中英

Hadoop map-reduce programming

I am new in Hadoop Map-reduce. My input is many text files and I want to write the map-reduce program such that it will write all the files-names and the associated sentences with the file names in one output file where I want to just emit the file-name(key) and the associated sentences(value) from the mapper and the reducer will collect the key and all the values and write the file-name and their associated sentences in the output.

Mapper and reducer:

public void map(Text key, Text value,
                OutputCollector<Text, Text> output,
                Reporter reporter) throws IOException {
    StringTokenizer itr = new StringTokenizer(value.toString(), ",");
    String filename = new String();
    FileSplit filesplit = (FileSplit) reporter.getInputSplit();
    filename = filesplit.getpath().getName();
    while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        output.collect(new Text(filename), word);
    }
}

public void reduce(Text key, Iterator<Text> values,
                   OutputCollector<Text, Text> output,
                   Reporter reporter) throws IOException {
    // int sum = 0;
    String translation = "";
    while (values.hasNext()) {
        translation += "|" + values.toString() + "|";
    }

    results.set(translation);
    output.collect(key, results);
}

When I run the above mapper and reducer with the same configuration of inputformat (keyvaluetextinputformat.class) it does not write any thing in the output.

What should I change to achieve my goal?

In your reduce method you declare values to be an Iterator. It should be declared as an Iterable instead.

public void reduce(Text key, Iterable<Text> values, ....

instead of

public void reduce(Text key, Iterator<Text> values, ....

Once you've done that, you can do:

Iterator<Text> iter = values.iterator();
while(iter.hasNext())
{
    translation += "|" + iter.next().toString() + "|";
}

Because you used the wrong type the method isn't overriding the default reduce method which doesn't do anything. That's why you get no output.

I also don't see where you declare the variable results, either.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM