简体   繁体   中英

Sorting ArrayList can fail when using list in comparator. Is this documented?

ArrayLists seem to be sorted with TimSort where the underlying list is not always consistent during sorting. It can happen that list entries disappear or appear twice upon calling the comparator.

In our comparator we are comparing keys for which we are using a function to get a value to compare for this key. As this function is used in other contexts we have a test whether the key is actually present in the list (something that wouldn't be necessary in the sorting):

        if (keys.contains(itemId)) {
          ...

As keys is the list we are sorting it can happen in the comparator that a key is not found in the list due to the internal mechanics of TimSort.

Question: Is this mentioned somewhere in Javadoc (couldn't find it) that you are not supposed to access underlying list in Comparator? Is this a poor implementation of TimSort that should sort a copy? Or was it a stupid idea in the first place to access underlying list in comparator?


The program below, provided by TJ Crowder , demonstrates that the contents of the underlying list may not be consistent during calls to the Comparator. (This program demonstrates the phenomenon in question, but it isn't representative of the actual application that's affected by the issue.)

import java.util.*;

public class Example {
    private static String[] chars = {
        "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"
    };

    private List<String> list;
    private String[] entries;

    private Example() {
        this.entries = new String[1000];
        for (int n = 0; n < 1000; ++n) {
            this.entries[n] = chars[n % chars.length] + n;
        }
        // Ensure it's an ArrayList, specifically
        this.list = new ArrayList<String>(Arrays.asList(this.entries));
    }

    public static void main(String[] args) {
        (new Example()).run();
    }

    class ListComparator implements Comparator<String> {
        public int compare(String a, String b) {
            for (String s : entries) {
                int i1 = Example.this.list.indexOf(s);
                if (i1 == -1) {
                    System.out.println(s + ": Missing");
                } else {
                    int i2 = Example.this.list.lastIndexOf(s);
                    if (i2 != i1) {
                        System.out.println(s + ": Duplicated, at " + i1 + " and " + i2);
                    }
                }
            }
            return a.compareTo(b);
        }
    }

    private void run() {
        this.list.sort(new ListComparator());
    }
}

Here are the first few lines of output from a run:

b1: Missing
a52: Duplicated, at 2 and 32
b27: Missing
a52: Duplicated, at 2 and 32
c2: Missing
a52: Duplicated, at 2 and 32
c2: Missing
c28: Missing
a52: Duplicated, at 2 and 32
b53: Duplicated, at 5 and 33
c28: Missing
d29: Missing
a52: Duplicated, at 2 and 32
b53: Duplicated, at 5 and 33
d3: Missing
d29: Missing
a52: Duplicated, at 2 and 32
b53: Duplicated, at 5 and 33
d3: Missing
d29: Missing
e30: Missing

A bit of history here: in JDK 7, the TimSort algorithm replaced the previous "legacy merge sort" algorithm. In JDK 8, Collections.sort() delegates to the new default method List.sort() . This default method is overridden by ArrayList , which does the sort in-place. The previous Collections.sort() implementation would copy the list to a temporary array, perform the sort on that temporary array, and then copy the elements from the temporary array back to the original list.

If the sort comparator looks in the list being sorted, then its behavior will certainly be affected by the new in-place sorting behavior of ArrayList introduced in JDK 8. The change from the "legacy merge sort" to TimSort in JDK 7 probably has no impact in this case, since JDK 7 still did the sorting on a temporary copy.

The copy-sort-copyback behavior of List.sort() is described in an "Implementation Requirements" section which specifies the behavior of the default method implementation, but which isn't part of the interface's contract that is imposed on all implementations. Thus, ArrayList (and other subclasses) are free to change this behavior. I note that there is no documentation for the overriding implementation ArrayList.sort() . I suppose it would be a small improvement if some documentation were added that specified the in-place sorting behavior.

If the in-place sorting of ArrayList is a problem, you could copy the list before sorting it:

List<Key> list = ... ;
List<Key> newList = new ArrayList<>(list);
newList.sort(keyComparator); // uses the old list
list = newList;

Alternatively, perhaps you could give some more details about the way the comparator works, and we might be able to figure out a way to rewrite it so that it doesn't need to look in the list being sorted. (I'd suggest asking another question for this.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM