简体   繁体   English

TimSort什么时候抱怨比较器损坏?

[英]When does TimSort complain about broken comparator?

Java 7 changed the sorting algorithm such that it throws an Java 7 更改了排序算法 ,从而引发了

java.lang.IllegalArgumentException: "Comparison method violates its general contract!" java.lang.IllegalArgumentException:“比较方法违反了其常规协定!”

in some cases when the used comparator is buggy. 在某些情况下,使用的比较器有故障。 Is it possible to tell what kind of bug in the comparator causes this? 是否可以判断比较器中的哪种错误导致了此错误? In my experiments it did not matter if x != x , it also did not matter if x < y and y < z but z < x , but it did matter if x = y and y = z but x < z for some values x, y, z. 在我的实验中,x!= x无关紧要,x <y和y <z但z <x也不重要,但是x = y和y = z但x <z对于某些值无关紧要x,y,z Is this generally so? 通常是这样吗?

(If there were a general rule to this, it might be easier to look for the bug in the comparator. But of course it is better to fix all bugs. :-) ) (如果有一个一般规则,在比较器中查找错误可能会更容易。但是当然,修复所有错误会更好。:-))

In particular, the following two comparators did not make TimSort complain: 特别地,以下两个比较器没有使TimSort抱怨:

    final Random rnd = new Random(52);

    Comparator<Integer> brokenButNoProblem1 = new Comparator<Integer>() {
        @Override
        public int compare(Integer o1, Integer o2) {
            if (o1 < o2) {
                return Compare.LESSER;
            } else if (o1 > o2) {
                return Compare.GREATER;
            }
            return rnd.nextBoolean() ? Compare.LESSER : Compare.GREATER;
        }
    };

    Comparator<Integer> brokenButNoProblem2 = new Comparator<Integer>() {
        @Override
        public int compare(Integer o1, Integer o2) {
            if (o1 == o2) {
                return Compare.EQUAL;
            }
            return rnd.nextBoolean() ? Compare.LESSER : Compare.GREATER;
        }
    };

but the following comparator did make it throw up: 但是以下比较器确实使它抛出:

    Comparator<Integer> brokenAndThrowsUp = new Comparator<Integer>() {
        @Override
        public int compare(Integer o1, Integer o2) {
            if (Math.abs(o1 - o2) < 10) {
                return Compare.EQUAL; // WRONG and does matter
            }
            return Ordering.natural().compare(o1, o2);
        }
    };

UPDATE: in some real life data we had a failure where there were no x,y,z with x = y and y = z but x < z . 更新:在一些实际数据中,我们失败了,没有x,y,z,其中x = y和y = z但x <z。 So It seems my guess was wrong, and it doesn't seem this specific kind failure only. 因此,似乎我的猜测是错误的,而且似乎并非仅是这种特定的失败。 Any better ideas? 还有更好的主意吗?

After looking at the code of ComparableTimSort I am not quite sure. 看完ComparableTimSort的代码后,我不太确定。 Let's analyze it. 让我们来分析一下。 Here is the only method that throws it (there is a similar method that does the same only with exchanged roles, so analyzing one of them is enough). 这是抛出该错误的唯一方法(有一个类似的方法仅对交换的角色执行相同的操作,因此分析其中一个就足够了)。

private void mergeLo(int base1, int len1, int base2, int len2) {
        assert len1 > 0 && len2 > 0 && base1 + len1 == base2;

        // Copy first run into temp array
        Object[] a = this.a; // For performance
        Object[] tmp = ensureCapacity(len1);

        int cursor1 = tmpBase; // Indexes into tmp array
        int cursor2 = base2;   // Indexes int a
        int dest = base1;      // Indexes int a
        System.arraycopy(a, base1, tmp, cursor1, len1);

        // Move first element of second run and deal with degenerate cases
        a[dest++] = a[cursor2++];
        if (--len2 == 0) {
            System.arraycopy(tmp, cursor1, a, dest, len1);
            return;
        }
        if (len1 == 1) {
            System.arraycopy(a, cursor2, a, dest, len2);
            a[dest + len2] = tmp[cursor1]; // Last elt of run 1 to end of merge
            return;
        }

        int minGallop = this.minGallop;  // Use local variable for performance
    outer:
        while (true) {
            int count1 = 0; // Number of times in a row that first run won
            int count2 = 0; // Number of times in a row that second run won

            /*
             * Do the straightforward thing until (if ever) one run starts
             * winning consistently.
             */
// ------------------ USUAL MERGE
            do {
                assert len1 > 1 && len2 > 0;
                if (((Comparable) a[cursor2]).compareTo(tmp[cursor1]) < 0) {
                    a[dest++] = a[cursor2++];
                    count2++;
                    count1 = 0;
                    if (--len2 == 0)
                        break outer;
                } else {
                    a[dest++] = tmp[cursor1++];
                    count1++;
                    count2 = 0;
                    if (--len1 == 1)
                        break outer;
                }
            } while ((count1 | count2) < minGallop);

// ------------------ GALLOP
            /*
             * One run is winning so consistently that galloping may be a
             * huge win. So try that, and continue galloping until (if ever)
             * neither run appears to be winning consistently anymore.
             */
            do {
                assert len1 > 1 && len2 > 0;
                count1 = gallopRight((Comparable) a[cursor2], tmp, cursor1, len1, 0);
                if (count1 != 0) {
                    System.arraycopy(tmp, cursor1, a, dest, count1);
                    dest += count1;
                    cursor1 += count1;
                    len1 -= count1;
// -->>>>>>>> HERE IS WHERE GALLOPPING TOO FAR WILL TRIGGER THE EXCEPTION
                    if (len1 <= 1)  // len1 == 1 || len1 == 0
                        break outer;
                }
                a[dest++] = a[cursor2++];
                if (--len2 == 0)
                    break outer;

                count2 = gallopLeft((Comparable) tmp[cursor1], a, cursor2, len2, 0);
                if (count2 != 0) {
                    System.arraycopy(a, cursor2, a, dest, count2);
                    dest += count2;
                    cursor2 += count2;
                    len2 -= count2;
                    if (len2 == 0)
                        break outer;
                }
                a[dest++] = tmp[cursor1++];
                if (--len1 == 1)
                    break outer;
                minGallop--;
            } while (count1 >= MIN_GALLOP | count2 >= MIN_GALLOP);
            if (minGallop < 0)
                minGallop = 0;
            minGallop += 2;  // Penalize for leaving gallop mode
        }  // End of "outer" loop
        this.minGallop = minGallop < 1 ? 1 : minGallop;  // Write back to field

        if (len1 == 1) {
            assert len2 > 0;
            System.arraycopy(a, cursor2, a, dest, len2);
            a[dest + len2] = tmp[cursor1]; //  Last elt of run 1 to end of merge
        } else if (len1 == 0) {
            throw new IllegalArgumentException(
                "Comparison method violates its general contract!");
        } else {
            assert len2 == 0;
            assert len1 > 1;
            System.arraycopy(tmp, cursor1, a, dest, len1);
        }
    }

The method performs a merging of two sorted runs. 该方法执行两个排序运行的合并。 It does a usual merge but starts "gallopping" once it encounters that one side starts "winning" (Ie, being always less than the other) all the time. 它进行通常的合并,但是一旦遇到一方总是“获胜”(即总是小于另一方),便开始“疾驰”。 Gallopping tries to make things faster by looking ahead more elements instead of comparing one element at a time. 奔腾试图通过向前看更多的元素而不是一次比较一个元素来使事情变得更快。 Since the runs should be sorted , looking ahead is fine. 由于应该对运行进行排序因此可以很好地进行展望。

You see that the exception is only throw when len1 is 0 at the end. 您会看到仅当len10时才引发异常。 The first observation is the following: During the usual merge, the exception can never be thrown since the loop aborts directly once len this 1 . 第一个观察是:在正常的合并,该异常不能被抛出,因为环路可放弃曾经直接len这种1 Thus, the exception can only be thrown as result of a gallop . 因此,只能由于疾驰而抛出异常

This already gives a strong hint that the exception behaviour is unreliable: As long as you have small data sets (so small that a generated run may never gallop, as MIN_GALLOP is 7 ) or the generated runs always coincidentally generate a merge that never gallops, you will never receive the exception. 这已经很明显地表明异常行为是不可靠的:只要您的数据集很小(数据集太小,以至于生成的运行可能永远不会疾驰,因为MIN_GALLOP7 ),或者生成的运行始终会巧合地产生一个不会疾驰的合并,您将永远不会收到例外。 Thus, without further reviewing the gallopRight method, we can come to the conclusion that you cannot rely on the exception: It may never be thrown no matter how wrong your comparator is . 因此,在不进一步检查gallopRight方法的情况下,我们可以得出以下结论:您不能依赖该异常: 无论您的比较器有多错误 ,都永远不会抛出该异常。

From the documentation : 文档中

IllegalArgumentException - (optional) if the natural ordering of the array elements is found to violate the Comparable contract IllegalArgumentException-(可选)如果发现数组元素的自然顺序违反了Comparable协定

I didn't find much on the mentioned contract, but IMHO it should represent a total order (ie the relation defined by the compareTo method has to be transitive , antisymmetric , and total ). 我在提到的合同上没有找到太多内容,但是恕我直言,它应该代表一个总订单 (即, compareTo方法定义的关系必须是transitiveantisymmetrictotal )。 If that requirement isn't met, sort might throw an IllegalArgumentException . 如果不满足该要求,则sort可能会抛出IllegalArgumentException (I say might because failure to meet this requirement could go unnoticed.) (我说这可能是因为未能满足此要求可能会被忽略。)

EDIT: added links to the properties that make a relation a total order. 编辑:添加链接到使一个关系总订单的属性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM