简体   繁体   English

哪个更有效:排序流或排序列表?

[英]What is more efficient: sorted stream or sorting a list?

Assume we have some items in a collection and we want to sort them using certain comparator, expecting result in a list:假设我们在一个集合中有一些项目,我们想使用某个比较器对它们进行排序,期望结果是一个列表:

Collection<Item> items = ...;
Comparator<Item> itemComparator = ...;

One of the approaches is to sort items in a list, something like:其中一种方法是对列表中的项目进行排序,例如:

List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);

Anothe approach is using a sorted stream:另一种方法是使用排序流:

List<Item> sortedItems = items
    .stream()
    .sorted(itemComparator)
    .collect(Collectors.toList());

I wonder, which approach is more efficient?我想知道,哪种方法更有效? Are there any advantages of a sorted stream (like faste sorting on multiple cores)?排序流是否有任何优势(例如多核上的快速排序)?

Efficient in a sense of runtime complexity/fastest.在运行时复杂性/最快的意义上高效。

I don't trust myself to implement a perfect benchmark and studying SortedOps did not really enlighten me.我不相信自己会实施完美的基准测试,并且研究SortedOps并没有真正启发我。

To be honest I don't trust myself too much either in JMH (unless I understand the assembly, which takes lots of time in my case), especially since I've used @Setup(Level.Invocation) , but here is a small test (I took the StringInput generation from some other test I did, but it should not matter, it's just some data to sort)说实话,我不相信自己太多无论是在JMH (除非我理解的组装,这需要大量的时间在我的情况),特别是因为我已经用@Setup(Level.Invocation)但这里是一个小测试(我从我做过的其他一些测试中获取了StringInput生成,但这应该无关紧要,它只是一些要排序的数据)

@State(Scope.Thread)
public static class StringInput {

    private String[] letters = { "q", "a", "z", "w", "s", "x", "e", "d", "c", "r", "f", "v", "t", "g", "b",
            "y", "h", "n", "u", "j", "m", "i", "k", "o", "l", "p" };

    public String s = "";

    public List<String> list;

    @Param(value = { "1000", "10000", "100000" })
    int next;

    @TearDown(Level.Invocation)
    public void tearDown() {
        s = null;
    }

    @Setup(Level.Invocation)
    public void setUp() {

         list = ThreadLocalRandom.current()
                .ints(next, 0, letters.length)
                .mapToObj(x -> letters[x])
                .map(x -> Character.toString((char) x.intValue()))
                .collect(Collectors.toList());

    }
}


@Fork(1)
@Benchmark
public List<String> testCollection(StringInput si){
    Collections.sort(si.list, Comparator.naturalOrder());
    return si.list;
}

@Fork(1)
@Benchmark
public List<String> testStream(StringInput si){
    return si.list.stream()
            .sorted(Comparator.naturalOrder())
            .collect(Collectors.toList());
}

Results show that Collections.sort is faster, but not by a big margin:结果表明Collections.sort更快,但幅度不大:

Benchmark                                 (next)  Mode  Cnt   Score   Error  Units
streamvsLoop.StreamVsLoop.testCollection    1000  avgt    2   0.038          ms/op
streamvsLoop.StreamVsLoop.testCollection   10000  avgt    2   0.599          ms/op
streamvsLoop.StreamVsLoop.testCollection  100000  avgt    2  12.488          ms/op
streamvsLoop.StreamVsLoop.testStream        1000  avgt    2   0.048          ms/op
streamvsLoop.StreamVsLoop.testStream       10000  avgt    2   0.808          ms/op
streamvsLoop.StreamVsLoop.testStream      100000  avgt    2  15.652          ms/op

It is safe to say that two forms of sort will have the same complexity ... even without looking at the code.可以肯定地说,两种形式的排序将具有相同的复杂性……即使不查看代码。 (If they didn't then one form would be severely broken!) (如果他们不这样做,那么一种形式将被严重破坏!)

Looking at Java 8 source code for streams (specifically the internal class java.util.stream.SortedOps ), the sorted() method adds a component to a stream pipeline that captures all of the stream elements into either an array or an ArrayList .查看 Java 8 流的源代码(特别是内部类java.util.stream.SortedOps ), sorted()方法向流管道添加了一个组件,该组件将所有流元素捕获到数组或ArrayList

  • An array is used if and only if the pipeline assembly code can deduce the number of elements in the stream ahead of time.当且仅当管道汇编代码可以提前推断出流中的元素数量时,才使用数组。

  • Otherwise, an ArrayList is used to gather the elements to be sorted.否则,使用ArrayList收集要排序的元素。

If an ArrayList is used, you incur the extra overhead of building / growing the list.如果使用ArrayList ,则会产生构建/增长列表的额外开销。

Then we return to two versions of the code:然后我们回到代码的两个版本:

List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);

In this version, the ArrayList constructor copies the elements items to an appropriately sized array, and then Collections.sort does an in-place sort of that array.在此版本中, ArrayList构造函数将元素items复制到适当大小的数组中,然后Collections.sort对该数组进行就地排序。 (This happens under the covers). (这发生在幕后)。

List<Item> sortedItems = items
    .stream()
    .sorted(itemComparator)
    .collect(Collectors.toList());

In this version, as we have seen above, the code associated with sorted() either builds and sorts an array (equivalent to what happens above) or it builds the ArrayList the slow way.在这个版本中,正如我们在上面看到的,与sorted()相关的代码要么构建和排序数组(相当于上面发生的事情),要么以缓慢的方式构建ArrayList But on top of that, there are the overheads of stream the data from items and to the collector.但除此之外,还有将数据从items流到收集器的开销。

Overall (with the Java 8 implementation at least) code examination tells me that first version of the code cannot be slower than the second version, and in most (if not all) cases it will be faster.总体而言(至少使用 Java 8 实现)代码检查告诉我代码的第一个版本不能比第二个版本慢,并且在大多数(如果不是全部)情况下它会更快。 But as the list gets larger, the O(NlogN) sorting will tend to dominate the O(N) overheads of copying.但是随着列表变大, O(NlogN)排序将倾向于支配复制的O(N)开销。 That will mean that the relative difference between the two versions will get smaller.这将意味着两个版本之间的相对差异会变小。

If you really care, you should write a benchmark to test the actual difference with a specific implementation of Java, and a specific input dataset.如果您真的很在意,您应该编写一个基准测试来测试与特定 Java 实现和特定输入数据集的实际差异。 (Or adapt @Eugene's benchmark!) (或调整@Eugene 的基准!)

Below is my benchmark (not really sure if it is correct):以下是我的基准测试(不确定它是否正确):

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OperationsPerInvocation;
import org.openjdk.jmh.annotations.OutputTimeUnit;

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(MyBenchmark.N)
public class MyBenchmark {

    public static final int N = 50;

    public static final int SIZE = 100000;

    static List<Integer> sourceList = new ArrayList<>();
    static {
        System.out.println("Generating the list");
        for (int i = 0; i < SIZE; i++) {
            sourceList.add(i);
        }
        System.out.println("Shuffling the list.");
        Collections.shuffle(sourceList);
    }

    @Benchmark
    public List<Integer> sortingList() {
        List<Integer> sortedList = new ArrayList<>(sourceList);
        Collections.sort(sortedList);
        return sortedList;
    }

    @Benchmark
    public List<Integer> sortedStream() {
        List<Integer> sortedList = sourceList.stream().sorted().collect(Collectors.toList());
        return sortedList;
    }

    @Benchmark
    public List<Integer> treeSet() {
        Set<Integer> sortedSet = new TreeSet<>(sourceList);
        List<Integer> sortedList = new ArrayList<>(sortedSet);
        return sortedList;
    }
}

Results:结果:

Benchmark                 Mode  Cnt       Score       Error  Units
MyBenchmark.sortedStream  avgt  200  300691.436 ± 15894.717  ns/op
MyBenchmark.sortingList   avgt  200  262704.939 ±  5073.915  ns/op
MyBenchmark.treeSet       avgt  200  856577.553 ± 49296.565  ns/op

As in @Eugene's benchmark, sorting list is slightly (ca. 20%) faster than sorted stream.在@Eugene 的基准测试中,排序列表比排序流略快(约 20%)。 What surprizes me a bit is that treeSet is significantly slower.让我有点treeSettreeSet的速度要慢得多。 I did not expect that.我没想到的是。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM