Java并行流性能

Question

While toying with the new Java streams, I have noticed something strange related to the performance of parallel streams. 在玩弄新的Java流时，我注意到与并行流的性能有关的一些奇怪现象。 I've used a simple program that reads the words from a text file and counts the words having length > 5 (the test file has 30000 words): 我使用了一个简单的程序，该程序从文本文件中读取单词并计算长度大于5的单词（测试文件包含30000个单词）：

    String contents = new String(Files.readAllBytes(Paths.get("text.txt")));
    List<String> words = Arrays.asList(contents.split("[\\P{L}]+"));
    long startTime;
    for (int i = 0; i < 100; i++) {
        startTime = System.nanoTime();
        words.parallelStream().filter(w -> w.length() > 5).count();
        System.out.println("Time elapsed [PAR]: " + (System.nanoTime() - startTime));
        startTime = System.nanoTime();
        words.stream().filter(w -> w.length() > 5).count();
        System.out.println("Time elapsed [SEQ]: " + (System.nanoTime() - startTime));
        System.out.println("------------------");
    }

This generates the following output on my machine (I mention only the first and the last 5 loop iterations): 这将在我的机器上生成以下输出（我仅提及前5个循环和最后5个循环迭代）：

Time elapsed [PAR]: 114185196
Time elapsed [SEQ]: 3222664
------------------
Time elapsed [PAR]: 569611
Time elapsed [SEQ]: 797113
------------------
Time elapsed [PAR]: 678231
Time elapsed [SEQ]: 414807
------------------
Time elapsed [PAR]: 755633
Time elapsed [SEQ]: 679085
------------------
Time elapsed [PAR]: 755633
Time elapsed [SEQ]: 393425
------------------
...
Time elapsed [PAR]: 90232
Time elapsed [SEQ]: 163785
------------------
Time elapsed [PAR]: 80396
Time elapsed [SEQ]: 154805
------------------
Time elapsed [PAR]: 83817
Time elapsed [SEQ]: 154377
------------------
Time elapsed [PAR]: 81679
Time elapsed [SEQ]: 186449
------------------
Time elapsed [PAR]: 68849
Time elapsed [SEQ]: 154804
------------------

Why is the first processing 100 times slower than the rest? 为什么第一个处理比其他处理慢100倍？ Why is the parallel stream slower than the sequential one in the first iterations but it is twice as fast in the last iterations? 为什么并行流在第一次迭代中比顺序流慢，但在最后一次迭代中却快两倍？ Why do both the sequential and parallel streams become faster over time? 为什么顺序流和并行流都随着时间的流逝变得更快？ Is this related to loop optimization? 这和循环优化有关吗？

Later edit: At Luigi's suggestion, I implemented the benchmark using JUnitBenchmarks : 以后的编辑：在Luigi的建议下，我使用JUnitBenchmarks实现了基准测试：

List<String> words = null;

@Before
public void setup() {
    try {
        String contents = new String(Files.readAllBytes(Paths.get("text.txt")));
        words = Arrays.asList(contents.split("[\\P{L}]+"));
    } catch (IOException e) {
        e.printStackTrace();
    }
}

@BenchmarkOptions(benchmarkRounds = 100)
@Test
public void parallelTest() {
    words.parallelStream().filter(w -> w.length() > 5).count();
}

@BenchmarkOptions(benchmarkRounds = 100)
@Test
public void sequentialTest() {
    words.stream().filter(w -> w.length() > 5).count();
}

I also bumped up the number of words from the test file to 300000. The new results are: 我还将测试文件中的单词数提高到300000。新结果是：

Benchmark.sequentialTest: [measured 100 out of 105 rounds, threads: 1 (sequential)] Benchmark.sequentialTest：[每105轮测得100次，线程：1（顺序）]

round: 0.08 [+- 0.04], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 62, GC.time: 1.53, time.total: 8.65, time.warmup: 0.81, time.bench: 7.85 取整：0.08 [+-0.04]，取整块：0.00 [+-0.00]，取整gc：0.00 [+-0.00]，GC。调用：62，GC。时间：1.53，时间总计：8.65，时间。暖身：0.81，时间。板凳：7.85

Benchmark.parallelTest: [measured 100 out of 105 rounds, threads: 1 (sequential)] Benchmark.parallelTest：[在105个回合中测出100个，线程：1（顺序）]

round: 0.06 [+- 0.02], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 32, GC.time: 0.79, time.total: 6.82, time.warmup: 0.39, time.bench: 6.43 取整：0.06 [+-0.02]，取整块：0.00 [+-0.00]，取整gc：0.00 [+-0.00]，GC。调用：32，GC。时间：0.79，时间总计：6.82，时间热身：0.39，时间。台：6.43

So it seems that the initial results were caused by a wrong microbenchmark configuration... 因此，最初的结果似乎是由错误的微基准配置引起的...

Answer 1

The Hotspot JVM starts executing the program in interpreted mode, and compiles frequently used parts to native code after some analysis. Hotspot JVM开始以解释模式执行程序，并在进行一些分析后将常用部分编译为本机代码。 The initial iterations of loops are generally slow because of this. 因此，循环的初始迭代通常很慢。

Java并行流性能

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-04-10 21:26:27

Java并行流性能

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-04-10 21:26:27

解决方案1
2 已采纳 2014-04-10 21:26:27