[英]What is more efficient: sorted stream or sorting a list?
Assume we have some items in a collection and we want to sort them using certain comparator, expecting result in a list:假设我们在一个集合中有一些项目,我们想使用某个比较器对它们进行排序,期望结果是一个列表:
Collection<Item> items = ...;
Comparator<Item> itemComparator = ...;
One of the approaches is to sort items in a list, something like:其中一种方法是对列表中的项目进行排序,例如:
List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);
Anothe approach is using a sorted stream:另一种方法是使用排序流:
List<Item> sortedItems = items
.stream()
.sorted(itemComparator)
.collect(Collectors.toList());
I wonder, which approach is more efficient?我想知道,哪种方法更有效? Are there any advantages of a sorted stream (like faste sorting on multiple cores)?
排序流是否有任何优势(例如多核上的快速排序)?
Efficient in a sense of runtime complexity/fastest.在运行时复杂性/最快的意义上高效。
I don't trust myself to implement a perfect benchmark and studying SortedOps
did not really enlighten me.我不相信自己会实施完美的基准测试,并且研究
SortedOps
并没有真正启发我。
To be honest I don't trust myself too much either in JMH
(unless I understand the assembly, which takes lots of time in my case), especially since I've used @Setup(Level.Invocation)
, but here is a small test (I took the StringInput
generation from some other test I did, but it should not matter, it's just some data to sort)说实话,我不相信自己太多无论是在
JMH
(除非我理解的组装,这需要大量的时间在我的情况),特别是因为我已经用@Setup(Level.Invocation)
但这里是一个小测试(我从我做过的其他一些测试中获取了StringInput
生成,但这应该无关紧要,它只是一些要排序的数据)
@State(Scope.Thread)
public static class StringInput {
private String[] letters = { "q", "a", "z", "w", "s", "x", "e", "d", "c", "r", "f", "v", "t", "g", "b",
"y", "h", "n", "u", "j", "m", "i", "k", "o", "l", "p" };
public String s = "";
public List<String> list;
@Param(value = { "1000", "10000", "100000" })
int next;
@TearDown(Level.Invocation)
public void tearDown() {
s = null;
}
@Setup(Level.Invocation)
public void setUp() {
list = ThreadLocalRandom.current()
.ints(next, 0, letters.length)
.mapToObj(x -> letters[x])
.map(x -> Character.toString((char) x.intValue()))
.collect(Collectors.toList());
}
}
@Fork(1)
@Benchmark
public List<String> testCollection(StringInput si){
Collections.sort(si.list, Comparator.naturalOrder());
return si.list;
}
@Fork(1)
@Benchmark
public List<String> testStream(StringInput si){
return si.list.stream()
.sorted(Comparator.naturalOrder())
.collect(Collectors.toList());
}
Results show that Collections.sort
is faster, but not by a big margin:结果表明
Collections.sort
更快,但幅度不大:
Benchmark (next) Mode Cnt Score Error Units
streamvsLoop.StreamVsLoop.testCollection 1000 avgt 2 0.038 ms/op
streamvsLoop.StreamVsLoop.testCollection 10000 avgt 2 0.599 ms/op
streamvsLoop.StreamVsLoop.testCollection 100000 avgt 2 12.488 ms/op
streamvsLoop.StreamVsLoop.testStream 1000 avgt 2 0.048 ms/op
streamvsLoop.StreamVsLoop.testStream 10000 avgt 2 0.808 ms/op
streamvsLoop.StreamVsLoop.testStream 100000 avgt 2 15.652 ms/op
It is safe to say that two forms of sort will have the same complexity ... even without looking at the code.可以肯定地说,两种形式的排序将具有相同的复杂性……即使不查看代码。 (If they didn't then one form would be severely broken!)
(如果他们不这样做,那么一种形式将被严重破坏!)
Looking at Java 8 source code for streams (specifically the internal class java.util.stream.SortedOps
), the sorted()
method adds a component to a stream pipeline that captures all of the stream elements into either an array or an ArrayList
.查看 Java 8 流的源代码(特别是内部类
java.util.stream.SortedOps
), sorted()
方法向流管道添加了一个组件,该组件将所有流元素捕获到数组或ArrayList
。
An array is used if and only if the pipeline assembly code can deduce the number of elements in the stream ahead of time.当且仅当管道汇编代码可以提前推断出流中的元素数量时,才使用数组。
Otherwise, an ArrayList
is used to gather the elements to be sorted.否则,使用
ArrayList
收集要排序的元素。
If an ArrayList
is used, you incur the extra overhead of building / growing the list.如果使用
ArrayList
,则会产生构建/增长列表的额外开销。
Then we return to two versions of the code:然后我们回到代码的两个版本:
List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);
In this version, the ArrayList
constructor copies the elements items
to an appropriately sized array, and then Collections.sort
does an in-place sort of that array.在此版本中,
ArrayList
构造函数将元素items
复制到适当大小的数组中,然后Collections.sort
对该数组进行就地排序。 (This happens under the covers). (这发生在幕后)。
List<Item> sortedItems = items
.stream()
.sorted(itemComparator)
.collect(Collectors.toList());
In this version, as we have seen above, the code associated with sorted()
either builds and sorts an array (equivalent to what happens above) or it builds the ArrayList
the slow way.在这个版本中,正如我们在上面看到的,与
sorted()
相关的代码要么构建和排序数组(相当于上面发生的事情),要么以缓慢的方式构建ArrayList
。 But on top of that, there are the overheads of stream the data from items
and to the collector.但除此之外,还有将数据从
items
流到收集器的开销。
Overall (with the Java 8 implementation at least) code examination tells me that first version of the code cannot be slower than the second version, and in most (if not all) cases it will be faster.总体而言(至少使用 Java 8 实现)代码检查告诉我代码的第一个版本不能比第二个版本慢,并且在大多数(如果不是全部)情况下它会更快。 But as the list gets larger, the
O(NlogN)
sorting will tend to dominate the O(N)
overheads of copying.但是随着列表变大,
O(NlogN)
排序将倾向于支配复制的O(N)
开销。 That will mean that the relative difference between the two versions will get smaller.这将意味着两个版本之间的相对差异会变小。
If you really care, you should write a benchmark to test the actual difference with a specific implementation of Java, and a specific input dataset.如果您真的很在意,您应该编写一个基准测试来测试与特定 Java 实现和特定输入数据集的实际差异。 (Or adapt @Eugene's benchmark!)
(或调整@Eugene 的基准!)
Below is my benchmark (not really sure if it is correct):以下是我的基准测试(不确定它是否正确):
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OperationsPerInvocation;
import org.openjdk.jmh.annotations.OutputTimeUnit;
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(MyBenchmark.N)
public class MyBenchmark {
public static final int N = 50;
public static final int SIZE = 100000;
static List<Integer> sourceList = new ArrayList<>();
static {
System.out.println("Generating the list");
for (int i = 0; i < SIZE; i++) {
sourceList.add(i);
}
System.out.println("Shuffling the list.");
Collections.shuffle(sourceList);
}
@Benchmark
public List<Integer> sortingList() {
List<Integer> sortedList = new ArrayList<>(sourceList);
Collections.sort(sortedList);
return sortedList;
}
@Benchmark
public List<Integer> sortedStream() {
List<Integer> sortedList = sourceList.stream().sorted().collect(Collectors.toList());
return sortedList;
}
@Benchmark
public List<Integer> treeSet() {
Set<Integer> sortedSet = new TreeSet<>(sourceList);
List<Integer> sortedList = new ArrayList<>(sortedSet);
return sortedList;
}
}
Results:结果:
Benchmark Mode Cnt Score Error Units
MyBenchmark.sortedStream avgt 200 300691.436 ± 15894.717 ns/op
MyBenchmark.sortingList avgt 200 262704.939 ± 5073.915 ns/op
MyBenchmark.treeSet avgt 200 856577.553 ± 49296.565 ns/op
As in @Eugene's benchmark, sorting list is slightly (ca. 20%) faster than sorted stream.在@Eugene 的基准测试中,排序列表比排序流略快(约 20%)。 What surprizes me a bit is that
treeSet
is significantly slower.让我有点
treeSet
是treeSet
的速度要慢得多。 I did not expect that.我没想到的是。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.