简体   繁体   中英

Using Iterators to remove elements from a Java Collection

There are many posts that suggest using Iterators to safely remove an element from a collection. Something like this:

Iterator<Book> i = books.iterator();
while(i.hasNext()){
    if(i.next().isbn().equals(isbn)){
        i.remove();
    }
}

According to the documentation, the benefit of using an Iterator is that it is "fail fast" in the sense that if any thread is modifying the collection (books in the above example), while the iterator is used, then the iterator would throw a ConcurrentModificationException. However, the documentation of this exception also says

Note that fail-fast behavior cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast operations throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: ConcurrentModificationException should be used only to detect bugs.

Does this mean that using iterators is not an option if 100% correctness has to be guaranteed? Do I need to design my code in such a way that removal while the collection is modified would always result in correct behavior? If so, can anyone give an example where using the.remove() method of an iterator is useful outside of testing?

Iterator.remove will work as long as no other thread changes the Collection while you're iterating over it. Sometimes its a handy feature.

When it comes to multithreaded environment, it really depends on how do you organize the code. For example if you create a collection inside a web request and do not share it with other requests (for example if it gets passed to some methods via method parameters) you can still safely use this method of traversing the collection.

On the other hand, if you have say a 'global' queue of metrics snapshots shared among all the requests, each request adds stats to this queue, and some other thread reads the queue elements and deletes the metrics, this way won't be appropriate. So its all about the use case and the how do you organize the code.

As for the example that you're asking for, say you have a collection of Strings and would like to remove all the strings that start with a letter 'a' by modifying the existing collection

Iterator<String> i = strings.iterator();
while(i.hasNext()){
    if(i.next().startsWith('a')){
        i.remove();
    }
}

Of course in Java 8+ you can achieve almost the same with Streams:

strings.stream()
.filter(s -> !s.startsWith('a'))
.collect(Collectors.toList());

However, this method creates a new collection, rather than modifying the existing one (like in the case with iterators).

In pre java 8 world (and iterators have appeared way before java 8 was available), we don't even have streams, so code like this was not really straightforward task to write.

Iterator#remove guarantees 100% correctness for single-threaded processing. In multi-threaded processing of data, it depends on how (synchronized/asynchronized processing, using a different list for collecting the elements to be removed etc.) you process the data.

As long as you do not want the same collection to be modified, you can collect the elements to be removed, into a separate List and use List#removeAll(Collection<?> c) as shown below:

import java.util.ArrayList;
import java.util.List;

public class Main {
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<>();
        list.add(1);
        list.add(2);
        list.add(3);
        list.add(4);

        List<Integer> elementsToBeRemoved = new ArrayList<>();

        for (Integer i : list) {
            if (i % 2 == 0) {
                elementsToBeRemoved.add(i);
            }
        }

        list.removeAll(elementsToBeRemoved);

        System.out.println(list);
    }
}

Output:

[1, 3]

In a loop, never remove elements using the index

For a beginner, it may be tempting to use List#remove(int index) to remove the elements using index but the fact that every remove operation resizes the List makes it produce confusing results eg

import java.util.Iterator;
import java.util.List;
import java.util.Vector;

public class Main {
    public static void main(String[] args) {
        List<Integer> list = new Vector<>();
        list.add(1);
        list.add(2);
        Iterator<Integer> i = list.iterator();
        while (i.hasNext()) {
            System.out.println("I'm inside the iterator loop.");
            i.next();
            list.remove(0);
        }

        System.out.println(list);
    }
}

Output:

I'm inside the iterator loop.
[2]

The reason for this output is depicted below:

在此处输入图像描述

Here is an interesting piece of code (could be a good interview question). Would this program compile? And if so, would it run without exceptions?

List<Integer> list = new Vector<>();
list.add(1);
list.add(2);
Iterator<Integer> i = list.iterator();
while (i.hasNext()) {
    i.next();
    list.remove(0);
}

Answer: yes. It would compile and run without exceptions. That's because there are two remove methods for the list:

E remove(int index) Removes the element at the specified position in this list (optional operation).

boolean remove(Object o) Removes the first occurrence of the specified element from this list, if it is present (optional operation).

And the one that gets called is boolean remove(Object o) . Since 0 is not in the list, the list is not modified, and there is no error. This doesn't mean that there's something wrong with the concept of an iterator, but it shows that, even in a single thread situation, just because an iterator is used, does not mean the developer cannot make mistakes.

Does this mean that using iterators is not an option if 100% correctness has to be guaranteed?

Not necessarily.

First of all, it depends on your criteria for correctness. Correctness can only be measured against specified requirements. Saying something is 100% correct is meaningless if you don't say what the requirements are.

There are also some generalizations that we can make.

  1. If a collection (and its iterator) is used by one thread only, 100% correctness can be guaranteed.

  2. A concurrent collection types can be safely accessed and updated via its iterators from any number of threads. There are some caveats though:

    • An iteration is not guaranteed to see structural changes made after the iteration starts.
    • An iterator is not designed to be shared by multiple threads.
    • Bulk operations on a ConcurrentHashMap are not atomic.

    If your correctness criteria do not depend one these things, then 100% correctness can be guaranteed.

Note: I'm not saying that iterators guarantee correctness. I am saying that iterators can be part of a correct solution, assuming that you use them the right way.

Do I need to design my code in such a way that removal while the collection is modified would always result in correct behavior?

It depends how you use the collection. See above.

But as a general rule, you do need to design and implement you code to be correct. (Correctness won't happen by magic...)

If so, can anyone give an example where using the remove() method of an iterator is useful outside of testing?

In any example where only one thread can access the collection, using remove() is 100% safe, for all standard collection classes.

In many examples where the collection is a concurrent type, remove() is 100% safe. (But there is no guarantee that an element will stay removed if another thread is simultaneously trying to add it. Or that it will be added for that matter.)

The bottom line is that if your application is multi-threaded, then you have to understand how different threads may interact with shared collections. There is no way to avoid that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM