I have two files as input:
fileA.txt:
learn hadoop
learn java
fileB.txt:
hadoop java
eclipse eclipse
Desired Output:
learn fileA.txt:2
hadoop fileA.txt:1 , fileB.txt:1
java fileA.txt:1 , fileB.txt:1
eclipse fileB.txt:2
My reduce method:
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
Set<Text> outputValues = new HashSet<Text>();
while (values.hasNext()) {
Text value = new Text(values.next());
// delete duplicates
outputValues.add(value);
}
boolean isfirst = true;
StringBuilder toReturn = new StringBuilder();
Iterator<Text> outputIter = outputValues.iterator();
while (outputIter.hasNext()) {
if (!isfirst) {
toReturn.append("/");
}
isfirst = false;
toReturn.append(outputIter.next().toString());
}
output.collect(key, new Text(toReturn.toString()));
}
I need help with the counter(count the words by file)
I managed to print:
learn fileA.txt
hadoop fileA.txt / fileB.txt
java fileA.txt / fileB.txt
eclipse fileB.txt
but cannot print the count per file
Any help will be much appreciated
as i understand this should print what you want:
@Override
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
Map<String, Integer> fileToCnt = new HashMap<String, Integer>();
while(values.hasNext()) {
String file = values.next().toString();
Integer current = fileToCnt.get(file);
if (current == null) {
current = 0;
}
fileToCnt.put(file, current + 1);
}
boolean isfirst = true;
StringBuilder toReturn = new StringBuilder();
for (Map.Entry<String, Integer> entry : fileToCnt.entrySet()) {
if (!isfirst) {
toReturn.append(", ");
}
isfirst = false;
toReturn.append(entry.getKey()).append(":").append(entry.getValue());
}
output.collect(key, new Text(toReturn.toString()));
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.