I am trying to remove the duplicated strings in an ArrayList called outputList in Hadoop.
Here is my code:
List<String> newList = new ArrayList<String>();
for( String item : outputList){
if(!newList.contains(item))
newList.add(item);
else newList.add("wrong");
}
The problems is that the strings in newList are all "wrong".
Some facts: 1. The above code works well at local machine.
I can write out the strings in outputList in hadoop. Most strings in outputList are different (duplicates exist).
I tried some other method to remove duplicated items. Like using HashSet. But when I use outputList to initialize a HashSet, the obtained HashSet is empty.
The java version in Hadoop is javac 1.6.0_18
Thanks.
The following is my reducer code:
public static class EditReducer
extends Reducer<Text,Text,Text,Text> {
private Text editor2 = new Text();
public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
//write the content of iterable to an array list.
List<String> editorList =new ArrayList<String>();
for (Text t:values) {
editorList.add(t.toString());
}
//if a user appears more than once in the list, add to outputList
int occ;
List<String> outputList =new ArrayList<String>();
for (int i=0;i<editorList.size();i++) {
occ= Collections.frequency(editorList, editorList.get(i));
if(occ>1) {
outputList.add(editorList.get(i));
}
}
//make outputList distinct
List<String> newList = new ArrayList<String>();
for( String item : outputList){
if(!newList.contains(item))
newList.add(item);
else newList.add("wrong");
}
for (String val : newList) {
editor2.set(val);
context.write(editor2,editor2);
}
}
}
You can create a nested for
loop inside your original for
loop and compare the strings that way:
List<String> newList = new ArrayList<String>();
for(String item : outputList) {
boolean contains = false;
for(String str: newList) {
if(str.equals(item)) {
contains = true;
break;
}
}
if(!contains) {
newList.add(item);
}
else {
newList.add("wrong");
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.