Java-删除ArrayList中的重复项

Question

我正在使用ArrayList存储Strings的程序上工作。 该程序通过菜单提示用户，并允许用户选择要执行的操作。 这样的操作是将字符串添加到列表，打印条目等。我想要做的是创建一个称为removeDuplicates()的方法。 此方法将搜索ArrayList并删除所有重复的值。 我想在列表中保留一个重复值的实例。 我也希望此方法返回删除的重复项总数。

我一直在尝试使用嵌套循环来完成此操作，但是我一直遇到麻烦，因为当条目被删除时， ArrayList的索引被更改，并且事情无法正常进行。 我从概念上知道我需要做什么，但是在代码中实现这个想法时遇到了麻烦。

这是一些伪代码：

首先进入 检查列表中的每个后续条目，并查看它们是否与第一个条目匹配； 删除列表中与第一个条目匹配的每个后续条目；

在检查完所有条目之后，转到第二个条目； 检查列表中的每个条目，看是否与第二个条目匹配； 删除列表中与第二个条目匹配的每个条目；

重复输入列表

这是我到目前为止的代码：

public int removeDuplicates()
{
  int duplicates = 0;

  for ( int i = 0; i < strings.size(); i++ )
  {
     for ( int j = 0; j < strings.size(); j++ )
     {
        if ( i == j )
        {
          // i & j refer to same entry so do nothing
        }

        else if ( strings.get( j ).equals( strings.get( i ) ) )
        {
           strings.remove( j );
           duplicates++;
        }
     }
 }

   return duplicates;
}

更新：似乎Will正在寻找一种家庭作业解决方案，该解决方案涉及开发删除重复项的算法，而不是使用Sets的实用解决方案。 看他的评论：

谢谢你的建议。 这是作业的一部分，我相信老师原本打算让解决方案中不包含习题集。 换句话说，我要提出一个解决方案，该解决方案无需实现HashSet即可搜索和删除重复项。 老师建议使用嵌套循环，这是我正在尝试做的事情，但是在删除某些条目后， ArrayList的索引一直存在一些问题。

Answer 1

为什么不使用像Set这样的Set （以及像HashSet这样的实现）自然防止重复呢？

Answer 2

您可以毫无问题地使用嵌套循环：

public static int removeDuplicates(ArrayList<String> strings) {

    int size = strings.size();
    int duplicates = 0;

    // not using a method in the check also speeds up the execution
    // also i must be less that size-1 so that j doesn't
    // throw IndexOutOfBoundsException
    for (int i = 0; i < size - 1; i++) {
        // start from the next item after strings[i]
        // since the ones before are checked
        for (int j = i + 1; j < size; j++) {
            // no need for if ( i == j ) here
            if (!strings.get(j).equals(strings.get(i)))
                continue;
            duplicates++;
            strings.remove(j);
            // decrease j because the array got re-indexed
            j--;
            // decrease the size of the array
            size--;
        } // for j
    } // for i

    return duplicates;

}

Answer 3

您可以尝试使用此衬纸获取String保留顺序的副本。

List<String> list;
List<String> dedupped = new ArrayList<String>(new LinkedHashSet<String>(list));

该方法也将分摊O（n）而不是O（n ^ 2）

Answer 4

只是为了澄清我对matt b答案的评论，如果您真的要计算删除的重复项的数量，请使用以下代码：

List<String> list = new ArrayList<String>();

// list gets populated from user input...

Set<String> set = new HashSet<String>(list);
int numDuplicates = list.size() - set.size();

Answer 5

我一直在尝试使用嵌套循环来完成此操作，但是我一直遇到麻烦，因为当条目被删除时 ，ArrayList的索引被更改，并且事情无法按预期进行

为什么每次删除条目时不减少计数器。

删除条目时，元素也会移动：

ej：

String [] a = {"a","a","b","c" }

职位：

a[0] = "a";
a[1] = "a";    
a[2] = "b";
a[3] = "c";

删除第一个“ a”后，索引为：

a[0] = "a";
a[1] = "b";
a[2] = "c";

因此，您应该考虑到这一点，并减小j （ j-- ）的值，以避免“跳过”某个值。

看这个截图：

它的工作

Answer 6

List<String> lst = new ArrayList<String>();

lst.add("one");
lst.add("one");
lst.add("two");
lst.add("three");
lst.add("three");
lst.add("three");
Set se =new HashSet(lst);
lst.clear();
lst = new ArrayList<String>(se);
for (Object ls : lst){
    System.out.println("Resulting output---------" + ls);   
}

Answer 7

public Collection removeDuplicates(Collection c) {
// Returns a new collection with duplicates removed from passed collection.
    Collection result = new ArrayList();

    for(Object o : c) {
        if (!result.contains(o)) {
            result.add(o);
        }
    }

    return result;
}

要么

public void removeDuplicates(List l) {
// Removes duplicates in place from an existing list
    Object last = null;
    Collections.sort(l);

    Iterator i = l.iterator();
    while(i.hasNext()) {
        Object o = i.next();
        if (o.equals(last)) {
            i.remove();
        } else {
            last = o;
        }
    }
}

两者都未经测试。

Answer 8

一种从重复列表中删除重复字符串的非常简单的方法

ArrayList al = new ArrayList();
// add elements to al, including duplicates
HashSet hs = new HashSet();
hs.addAll(al);
al.clear();
al.addAll(hs);

Answer 9

假设您不能使用您所说的Set，解决问题的最简单方法是使用临时列表，而不是尝试删除重复的副本：

public class Duplicates {

    public static void main(String[] args) {
        List<String> list = new ArrayList<String>();
        list.add("one");
        list.add("one");
        list.add("two");
        list.add("three");
        list.add("three");
        list.add("three");

        System.out.println("Prior to removal: " +list);
        System.out.println("There were " + removeDuplicates(list) + " duplicates.");
        System.out.println("After removal: " + list);
    }

    public static int removeDuplicates(List<String> list) {
        int removed = 0;
        List<String> temp = new ArrayList<String>();

        for(String s : list) {
            if(!temp.contains(s)) {
                temp.add(s);
            } else {
                //if the string is already in the list, then ignore it and increment the removed counter
                removed++;
            }
        }

        //put the contents of temp back in the main list
        list.clear();
        list.addAll(temp);

        return removed;
    }

}

Answer 10

您可以做这样的事情，上面的人们回答的必须是另一种选择，但这是另一种选择。

for (int i = 0; i < strings.size(); i++) {
    for (int j = j + 1; j > strings.size(); j++) {
      if(strings.get(i) == strings.get(j)) {
            strings.remove(j);
            j--;
       }`
    }
  }

return strings;

Answer 11

我有点晚加入这个问题，但是对于使用GENERIC类型的同一个问题，我已经有了更好的解决方案。 上面提供的所有解决方案只是一个解决方案。 它们增加了导致整个运行时线程复杂性的原因。

RemoveDuplicacy.java

我们可以使用应该在加载时执行所需操作的技术来最小化它。

示例：假设当您使用以下类类型的数组列表时：

ArrayList<User> usersList = new ArrayList<User>();
        usersList.clear();

        User user = new User();
        user.setName("A");
        user.setId("1"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("A");
        user.setId("1"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("AB");
        user.setId("2"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("C");
        user.setId("4");
        usersList.add(user);

        user = new User();
        user.setName("A");
        user.setId("1"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("A");
        user.setId("2"); // duplicate
        usersList.add(user);


}

该类是上面使用的arraylist的基础：User class

class User {
    private String name;
    private String id;

    /**
     * @param name
     *            the name to set
     */
    public void setName(String name) {
        this.name = name;
    }

    /**
     * @return the name
     */
    public String getName() {
        return name;
    }

    /**
     * @param id
     *            the id to set
     */
    public void setId(String id) {
        this.id = id;
    }

    /**
     * @return the id
     */
    public String getId() {
        return id;
    }

}

现在在Java中，存在Object（parent）Class的两个Overrided方法，它们可以在这里帮助更好地实现我们的目的，它们是：

@Override
    public int hashCode() {

        final int prime = 31;
        int result = 1;
        result = prime * result + ((id == null) ? 0 : id.hashCode());
        return result;

    }

    @Override
    public boolean equals(Object obj) {

        if (this == obj)
            return true;

        if (obj == null)
            return false;

        if (getClass() != obj.getClass())
            return false;

        User other = (User) obj;

        if (id == null) {
            if (other.id != null)
                return false;

        } else if (!id.equals(other.id))
            return false;

        return true;

    }

您必须在User类中覆盖这些方法

这是完整的代码：

https://gist.github.com/4584310

让我知道您是否有任何疑问。

Answer 12

您可以将列表添加到HashSet中，然后再次将该哈希集转换为list以删除重复项。

public static int removeDuplicates(List<String> duplicateList){
    List<String> correctedList = new ArrayList<String>();
    Set<String> a = new HashSet<String>();
    a.addAll(duplicateList);
    correctedList.addAll(a);
    return (duplicateList.size()-correctedList.size());
}

在这里它将返回重复的数量。 您还可以对所有唯一值使用正确的List

Answer 13

使用集合是删除重复项的最佳选择：

如果您有一个数组列表，则可以删除重复项并仍然保留数组列表功能：

 List<String> strings = new ArrayList<String>();
 //populate the array
 ...
 List<String> dedupped = new ArrayList<String>(new HashSet<String>(strings));
 int numdups = strings.size() - dedupped.size();

如果不能使用集合，则对数组（Collections.sort（））进行排序并遍历列表，检查当前元素是否等于前一个元素（如果是），将其删除。

Answer 14

使用集是最佳选择（如其他建议）。

如果要相互比较列表中的所有元素，则应稍微调整一下for循环：

for(int i = 0; i < max; i++)
    for(int j = i+1; j < max; j++)

这样，您就不会只比较一次而不是两次比较每个元素。 这是因为与第一个循环相比，第二个循环从下一个元素开始。

同样，当在列表上进行迭代时从列表中删除时（甚至当您使用for循环而不是迭代器时），请记住要减小列表的大小。 一种常见的解决方案是保留要删除的项目的另一个列表，然后在确定要删除的项目之后，将其从原始列表中删除。

Answer 15

public ArrayList removeDuplicates(ArrayList <String> inArray)
{
    ArrayList <String> outArray = new ArrayList();
    boolean doAdd = true;
    for (int i = 0; i < inArray.size(); i++)
    {
        String testString = inArray.get(i);
        for (int j = 0; j < inArray.size(); j++)
        {
            if (i == j)
            {
                break;
            }
            else if (inArray.get(j).equals(testString))
            {
                doAdd = false;
                break;
            }

        }
        if (doAdd)
        {
            outArray.add(testString);
        }
        else
        {
            doAdd = true;
        }

    }
    return outArray;

}

Answer 16

您可以用空字符串*替换重复项，从而使索引保持原样。 然后，在完成操作后，您可以删除空字符串。

*但仅当空字符串在您的实现中无效时。

Answer 17

public <Foo> Entry<Integer,List<Foo>> uniqueElementList(List<Foo> listWithPossibleDuplicates) {
  List<Foo> result = new ArrayList<Foo>();//...might want to pre-size here, if you have reliable info about the number of dupes
  Set<Foo> found = new HashSet<Foo>(); //...again with the pre-sizing
  for (Foo f : listWithPossibleDuplicates) if (found.add(f)) result.add(f);
  return entryFactory(listWithPossibleDuplicates.size()-found.size(), result);
}

然后是一些entryFactory(Integer key, List<Foo> value)方法。 如果您想更改原始列表（可能不是一个好主意，但是随便什么），而是：

public <Foo> int removeDuplicates(List<Foo> listWithPossibleDuplicates) {
  int original = listWithPossibleDuplicates.size();
  Iterator<Foo> iter = listWithPossibleDuplicates.iterator();
  Set<Foo> found = new HashSet<Foo>();
  while (iter.hasNext()) if (!found.add(iter.next())) iter.remove();
  return original - found.size();
}

对于使用字符串的特定情况，您可能需要处理一些其他的相等约束（例如，大写和小写版本是相同还是不同？）。

编辑：啊，这是家庭作业。 在Java Collections框架以及Set中查找Iterator / Iterable，看看您是否得出与我提供的结论相同的结论。 泛型部分只是肉汁。

Answer 18

您在代码中看到的问题是您在迭代过程中删除了一个条目，从而使迭代位置无效。

例如：

{"a", "b", "c", "b", "b", "d"} 
       i         j

现在您要删除字符串[j]。

{"a", "b", "c", "b", "d"} 
       i         j

内部循环结束并且j递增。

{"a", "b", "c", "b", "d"} 
       i              j

仅检测到一个重复的“ b” ...哎呀。

在这些情况下，最佳实践是存储必须删除的位置，并在完成对数组列表的迭代之后将其删除。 （一个额外的好处是，您或编译器可以在循环之外优化strings.size（）调用）

提示，您可以在i + 1处开始用j进行迭代，您已经检查了0-i！

Answer 19

内部for循环无效。 如果删除元素，则无法递增j ，因为j现在指向删除元素之后的元素，因此您需要对其进行检查。

换句话说，应该使用while循环而不是for循环，并且仅当i和j的元素不匹配时才递增j 。 如果他们不匹配，在删除元素j 。 size()将减少1，并且j现在将指向以下元素，因此无需增加j 。

同样，没有理由检查内部循环中的所有元素，仅检查i元素，因为i之前的重复项已被先前的迭代删除。

Answer 20

下面的代码无需更改列表的顺序即可从列表中删除重复的元素，而无需使用临时列表，也无需使用任何设置变量。该代码节省了内存并提高了性能。

这是一种通用方法，适用于任何类型的列表。

这是采访之一中提出的问题。 在许多论坛中搜索了解决方案，但找不到解决方案，因此认为这是发布代码的正确论坛。

    public List<?> removeDuplicate(List<?> listWithDuplicates) {
    int[] intArray = new int[listWithDuplicates.size()];
    int dupCount = 1;
    int arrayIndex = 0;
    int prevListIndex = 0; // to save previous listIndex value from intArray
    int listIndex;

    for (int i = 0; i < listWithDuplicates.size(); i++) {
        for (int j = i + 1; j < listWithDuplicates.size(); j++) {
            if (listWithDuplicates.get(j).equals(listWithDuplicates.get(i)))
                dupCount++;

            if (dupCount == 2) {
                intArray[arrayIndex] = j; // Saving duplicate indexes to an array
                arrayIndex++;
                dupCount = 1;
            }
        }
    }

    Arrays.sort(intArray);

    for (int k = intArray.length - 1; k >= 0; k--) {
        listIndex = intArray[k];
        if (listIndex != 0 && prevListIndex != listIndex){
            listWithDuplicates.remove(listIndex);
            prevListIndex = listIndex;
        }
    }
    return listWithDuplicates;
}

Java-删除ArrayList中的重复项

问题描述

20 个解决方案

解决方案1
37 2010-03-12 19:19:42

解决方案2
17 2010-03-13 10:06:51

解决方案3
14 2010-03-13 10:37:27

解决方案4
8 2010-03-12 19:27:03

解决方案5
4 2010-03-12 19:56:15

解决方案6
4 2011-05-30 09:40:19

解决方案7
3 2010-03-12 20:08:22

解决方案8
3 2015-01-22 09:23:26

解决方案9
1 2010-03-13 09:23:07

解决方案10
1 2015-08-27 03:36:59

解决方案11
0 2013-01-21 07:42:12

解决方案12
0 2013-08-03 22:46:44

解决方案13
0 2010-03-12 19:32:48

解决方案14
0 2010-03-12 19:51:19

解决方案15
0

解决方案16
0 2010-03-12 20:17:48

解决方案17
0 2010-03-12 20:33:47

解决方案18
0 2010-03-13 22:15:15

解决方案19
0 2010-03-14 12:56:12

解决方案20
0 2014-07-19 14:54:43

Java-删除ArrayList中的重复项

问题描述

20 个解决方案

解决方案1 37 2010-03-12 19:19:42

解决方案2 17 2010-03-13 10:06:51

解决方案3 14 2010-03-13 10:37:27

解决方案4 8 2010-03-12 19:27:03

解决方案5 4 2010-03-12 19:56:15

解决方案6 4 2011-05-30 09:40:19

解决方案7 3 2010-03-12 20:08:22

解决方案8 3 2015-01-22 09:23:26

解决方案9 1 2010-03-13 09:23:07

解决方案10 1 2015-08-27 03:36:59

解决方案11 0 2013-01-21 07:42:12

解决方案12 0 2013-08-03 22:46:44

解决方案13 0 2010-03-12 19:32:48

解决方案14 0 2010-03-12 19:51:19

解决方案15 0

解决方案16 0 2010-03-12 20:17:48

解决方案17 0 2010-03-12 20:33:47

解决方案18 0 2010-03-13 22:15:15

解决方案19 0 2010-03-14 12:56:12

解决方案20 0 2014-07-19 14:54:43

解决方案1
37 2010-03-12 19:19:42

解决方案2
17 2010-03-13 10:06:51

解决方案3
14 2010-03-13 10:37:27

解决方案4
8 2010-03-12 19:27:03

解决方案5
4 2010-03-12 19:56:15

解决方案6
4 2011-05-30 09:40:19

解决方案7
3 2010-03-12 20:08:22

解决方案8
3 2015-01-22 09:23:26

解决方案9
1 2010-03-13 09:23:07

解决方案10
1 2015-08-27 03:36:59

解决方案11
0 2013-01-21 07:42:12

解决方案12
0 2013-08-03 22:46:44

解决方案13
0 2010-03-12 19:32:48

解决方案14
0 2010-03-12 19:51:19

解决方案15
0

解决方案16
0 2010-03-12 20:17:48

解决方案17
0 2010-03-12 20:33:47

解决方案18
0 2010-03-13 22:15:15

解决方案19
0 2010-03-14 12:56:12

解决方案20
0 2014-07-19 14:54:43