简体   繁体   中英

Java ArrayList<String> .contains() in hadoop

I am trying to remove the duplicated strings in an ArrayList called outputList in Hadoop.

Here is my code:

List<String> newList = new ArrayList<String>();

    for( String item : outputList){
      if(!newList.contains(item))
        newList.add(item);
      else newList.add("wrong");
    }

The problems is that the strings in newList are all "wrong".

Some facts: 1. The above code works well at local machine.

  1. I can write out the strings in outputList in hadoop. Most strings in outputList are different (duplicates exist).

  2. I tried some other method to remove duplicated items. Like using HashSet. But when I use outputList to initialize a HashSet, the obtained HashSet is empty.

  3. The java version in Hadoop is javac 1.6.0_18

Thanks.

The following is my reducer code:

public static class EditReducer 
       extends Reducer<Text,Text,Text,Text> {

    private Text editor2 = new Text();

    public void reduce(Text key, Iterable<Text> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      //write the content of iterable to an array list.

     List<String> editorList =new ArrayList<String>();
     for (Text t:values) {
      editorList.add(t.toString());

     }


    //if a user appears more than once in the list, add to outputList
     int occ;
     List<String> outputList =new ArrayList<String>();

     for (int i=0;i<editorList.size();i++) {

      occ= Collections.frequency(editorList, editorList.get(i));
      if(occ>1) {
        outputList.add(editorList.get(i));
      }
    }



    //make outputList distinct
   List<String> newList = new ArrayList<String>();

   for( String item : outputList){
      if(!newList.contains(item))
        newList.add(item);
      else newList.add("wrong");
    }

      for (String val : newList) {
        editor2.set(val);
        context.write(editor2,editor2); 
      }
    }

  }

You can create a nested for loop inside your original for loop and compare the strings that way:

List<String> newList = new ArrayList<String>();

    for(String item : outputList) {
        boolean contains = false;
        for(String str: newList) {
            if(str.equals(item)) {
                contains = true;
                break;
            }
        }
        if(!contains) {
            newList.add(item);
        } 
        else {
            newList.add("wrong");
        }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM