简体   繁体   English

基于两个标准对列表进行排序的最佳算法是什么?

[英]What is the best algorithm to sort a list on the basis of two criteria?

I have a list which I need to sort based on two criteria. 我有一个列表,我需要根据两个标准进行排序。 The first criterion is a Boolean , let's say isBig . 第一个标准是Boolean ,比如isBig The second one is a Long , which represents a timestamp. 第二个是Long ,表示时间戳。

I need to order the elements of the list in this way: before the isBig = true , and then the isBig = false . 我需要以这种方式排序列表的元素:在isBig = true之前,然后是isBig = false Within these groups, the single elements should be ordered descending on the basis of their timestamp. 在这些组中,单个元素应根据其时间戳降序排序。

Basically, I expect the result to be something like this: 基本上,我希望结果是这样的:

isBig - 2015/10/29
isBig - 2015/10/28
isBig - 2015/10/27
!isBig - 2015/10/30
!isBig - 2015/10/27
!isBig - 2015/10/26

Let's say the object is this: 让我们说对象是这样的:

public class Item {
    Boolean isBig;
    Long timestamp;
    // ...
}

and the list is just List<Item> list . 列表只是List<Item> list

I figured out that one method would be make three for-cycles: the first to make up the two groups: isBig and !isBig . 我发现一种方法是制作三个for-cycles:第一个组成两个组: isBig!isBig The second and the third for sorting the elements within them. 第二个和第三个用于对其中的元素进行排序。 Finally I merge the two lists. 最后我合并了两个列表。

Is there a more efficient algorithm for sorting lists on the basis of two criteria? 是否有更有效的算法根据两个标准对列表进行排序?

You can sort the list directly using a custom comparison method which checks both criteria. 您可以使用自定义比较方法直接对列表进行排序,该方法会检查两个条件

Use the Collections.sort method and pass a custom comparator with the method compare overriden to: 使用Collections.sort方法并将自定义比较器与compare overriden方法一起传递给:

 int compare(Item o1, Item o2) {
   if (o1.isBig && !o2.isBig)
     return -1;
   if (!o1.isBig && o2.isBig)
     return 1;
   if (o1.timestamp < o2.timestamp)
     return -1;
   if (o1.timestamp > o2.timestamp)
     return 1;
   return 0;
 }

If you are obsessed with performance you could possibly speed it up by a few percents with a more sophisticated approach, but for a list of a few hundred elements the gains would be negligible. 如果你沉迷于表现,你可能会用更复杂的方法加速几个百分点,但是对于几百个元素的列表,收益可以忽略不计。

An optimized comparison method: 优化的比较方法:

int compare(Item o1, Item o2) {
   int bigness = (o2.isBig ? 2 : 0) - (o1.isBig ? 2 : 0);
   long diff = o1.timestamp - o2.timestamp;
   return bigness + (int) Long.signum(diff);
}

It features no conditional branches what means it will probably be faster than the naive version above. 它没有条件分支,这意味着它可能比上面的天真版本更快。

That's probably everything that can be done for performance. 这可能是性能所能完成的一切。 If we knew something more about your data (for instance there are always more big object than small ones, or all the timestamps are unique, or all the timestamps are from a certain narrow range etc) we could probably propose some better solution. 如果我们对您的数据有更多了解(例如,总是有大对象而不是小对象,或者所有时间戳都是唯一的,或者所有时间戳都来自某个窄范围等),我们可能会提出一些更好的解决方案。 However, when we assume that your data is arbitrary and has no specific pattern than the very best solution is to use a standard sort utility like I've shown above. 但是,当我们假设您的数据是任意的并且没有特定模式时,最好的解决方案是使用我上面所示的标准排序实用程序。

Splitting the list into two sublists and sorting them separately will definitely be slower. 将列表拆分为两个子列表并单独排序肯定会更慢。 Actually the sorting algorithm will most probably divide the data into two groups and then recursively into four groups, and so on. 实际上,排序算法很可能将数据分成两组,然后递归地分成四组,依此类推。 However, the division won't follow the isBig criterion. 但是,该部门不会遵循isBig标准。 If you want to learn more, read how quick sort or merge sort work. 如果您想了解更多信息,请阅读快速排序合并排序工作的方式。

In theory, the approach using two separate lists should be faster than the approach using a two-step Comparator , because a comparison based on one field is obviously faster than a comparison based on two. 理论上,使用两个单独列表的方法应该比使用两步Comparator的方法更快 ,因为基于一个字段的比较明显快于基于两个字段的比较。 By using two lists you are speeding up the part of the algorithm that has O(n log n) time complexity (the sort), at the expense of an additional initial stage (splitting into two pieces) which has time complexity O(n) . 通过使用两个列表,您正在加速具有O(n log n)时间复杂度(排序)的算法部分,代价是额外的初始阶段(分成两部分),其具有时间复杂度O(n) Since n log n > n , the two lists approach should be faster for very, very large values of n . 由于n log n > n ,对于非常非常大的n值,两个列表方法应该更快。

However, in practice we are talking about such tiny differences in times that you have to have extremely long lists before the two lists approach wins out, and so it's very difficult to demonstrate the difference using lists before you start running into problems such as an OutOfMemoryError . 然而,在实践中,我们正在讨论在两个列表方法胜出之前必须有非常长的列表的时间上的这种微小差异,因此在开始遇到诸如OutOfMemoryError问题之前使用列表来演示差异是非常困难的。 。

However, if you use arrays rather than lists, and use clever tricks to do it in place rather than using separate data structures, it is possible to beat the two-step Comparator approach, as the code below demonstrates. 但是,如果您使用数组而不是列表,并使用巧妙的技巧来实现它而不是使用单独的数据结构,则可以超越两步Comparator方法,如下面的代码所示。 Before anybody complains: yes I know this is not a proper benchmark! 在任何人抱怨之前:是的,我知道这不是一个合适的基准!

Even though sort2 is faster than sort1 , I would probably not use it in production code. 尽管sort2sort1快,但我可能不会在生产代码中使用它。 It is better to use familiar idioms and code that obviously works, rather than code that is harder to understand and maintain, even if it slightly faster. 最好使用熟悉的成语和明显有效的代码,而不是代码更难理解和维护,即使它稍微快一些。

public class Main {

    static Random rand = new Random();

    static Compound rand() {
        return new Compound(rand.nextBoolean(), rand.nextLong());
    }

    static Compound[] randArray() {
        int length = 100_000;
        Compound[] temp = new Compound[length];
        for (int i = 0; i < length; i++)
            temp[i] = rand();
        return temp;
    }

    static class Compound {
        boolean bool;
        long time;

        Compound(boolean bool, long time) {
            this.bool = bool;
            this.time = time;
        }

        @Override
        public boolean equals(Object o) {
            if (this == o) 
                return true;
            if (o == null || getClass() != o.getClass()) 
                return false;
            Compound compound = (Compound) o;
            return bool == compound.bool && time == compound.time;
        }   

        @Override
        public int hashCode() {
            int result = (bool ? 1 : 0);
            result = 31 * result + (int) (time ^ (time >>> 32));
            return result;
        }
    }

    static final Comparator<Compound> COMPARATOR = new Comparator<Compound>() {
        @Override
        public int compare(Compound o1, Compound o2) {
            int result = (o1.bool ? 0 : 1) - (o2.bool ? 0 : 1);
            return result != 0 ? result : Long.compare(o1.time, o2.time);
        }
    };

    static final Comparator<Compound> LONG_ONLY_COMPARATOR = new Comparator<Compound>() {
        @Override
        public int compare(Compound o1, Compound o2) {
            return Long.compare(o1.time, o2.time);
        }
    };

    static void sort1(Compound[] array) {
        Arrays.sort(array, COMPARATOR);
    }

    static void sort2(Compound[] array) {
        int secondIndex = array.length;
        if (secondIndex == 0)
            return;
        int firstIndex = 0;
        for (Compound c = array[0];;) {
            if (c.bool) {
                array[firstIndex++] = c;
                if (firstIndex == secondIndex)
                    break;
                c = array[firstIndex];
            } else {
                Compound c2 = array[--secondIndex];
                array[secondIndex] = c;
                if (firstIndex == secondIndex)
                    break;
                c = c2;
            }
        }
        Arrays.sort(array, 0, firstIndex, LONG_ONLY_COMPARATOR);
        Arrays.sort(array, secondIndex, array.length, LONG_ONLY_COMPARATOR);
    }

    public static void main(String... args) {

        // Warm up the JVM and check the algorithm actually works.
        for (int i = 0; i < 20; i++) {
            Compound[] arr1 = randArray();
            Compound[] arr2 = arr1.clone();
            sort1(arr1);
            sort2(arr2);
            if (!Arrays.equals(arr1, arr2))
                throw new IllegalStateException();
            System.out.println(i);
        }

        // Begin the test proper.
        long normal = 0;
        long split = 0;
        for (int i = 0; i < 100; i++) {
            Compound[] array1 = randArray();
            Compound[] array2 = array1.clone();

            long time = System.nanoTime();
            sort1(array1);
            normal += System.nanoTime() - time;

            time = System.nanoTime();
            sort2(array2);
            split += System.nanoTime() - time;

            System.out.println(i);
            System.out.println("COMPARATOR:           " + normal);
            System.out.println("LONG_ONLY_COMPARATOR: " + split);
        }
    }
}

The following things you need to do to have two comparable objects for sorting on two parameters. 要使两个可比较的对象在两个参数上进行排序,您需要执行以下操作。

  1. You need to implement Comparator for two comparable objects that you have is one Boolean and one Timestamp. 您需要为两个可比较的对象实现Comparator,即一个布尔值和一个时间戳。
  2. you need to pass these comparators to Collections.sort() because as they are objects that compared for two keys and the data structure is not of primitives they need Collections.sort(). 你需要将这些比较器传递给Collections.sort(),因为它们是比较两个键并且数据结构不是原语的对象,它们需要Collections.sort()。

     /** * Comparator to sort employees list or array in order of Salary */ public static Comparator<BooleanComaprator> booleanComparator= new Comparator<BooleanComaprator>() { @Override public int compare(BooleanComaprator e1, BooleanComaprator e2) { if (e1.isBig && !e2.isBig) return -1; if (!e1.isBig && e2.isBig) return 1; else return 0; } } 

    use this object in Collections.sort(booleanComparator); Collections.sort(booleanComparator);使用此对象Collections.sort(booleanComparator);

This is called sorting by multiple keys, and it's easy to do. 这称为按多个键排序,这很容易。 If you're working with a sort library function that takes a comparator callback function to decide the relative ordering of two elements, define the comparator function so that it first checks whether the two input values a and b have equal isBig values, and, if not, immediately returns a.isBig > b.isBig (I'm assuming here that > is defined for boolean values; if not, substitute the obvious test). 如果您正在使用一个排序库函数,该函数使用比较器回调函数来决定两个元素的相对排序,请定义比较器函数,以便它首先检查两个输入值a和b是否具有相等的isBig值,如果不,立即返回a.isBig > b.isBig (我在这里假设>是为布尔值定义的;如果不是,则替换明显的测试)。 But if the isBig values are equal, you should return a.timestamp > b.timestamp . 但是如果isBig值相等,则应返回a.timestamp > b.timestamp

You can a define a custom comparator and use it to sort the List . 您可以定义自定义比较器并使用它来对List进行排序。 Eg 例如

class ItemComparator implements Comparator {
    @Override
    public int compare (Item a, Item b) {
        int bc = Boolean.compare(a.isBig, b.isBig);
        if (bc != 0)
            return bc;
        return Long.compare(a.timestamp, b.timestamp);
    }
}

and use it like this 并像这样使用它

Collections.sort(list, ItemComparator);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM