簡體   English   中英

Java 8:流與 Collections 的性能

[英]Java 8: performance of Streams vs Collections

I'm new to Java 8. I still don't know the API in depth, but I've made a small informal benchmark to compare the performance of the new Streams API vs the good old Collections.

該測試包括過濾Integer列表,並為每個偶數計算平方根並將其存儲在結果ListDouble

這是代碼:

    public static void main(String[] args) {
        //Calculating square root of even numbers from 1 to N       
        int min = 1;
        int max = 1000000;

        List<Integer> sourceList = new ArrayList<>();
        for (int i = min; i < max; i++) {
            sourceList.add(i);
        }

        List<Double> result = new LinkedList<>();


        //Collections approach
        long t0 = System.nanoTime();
        long elapsed = 0;
        for (Integer i : sourceList) {
            if(i % 2 == 0){
                result.add(Math.sqrt(i));
            }
        }
        elapsed = System.nanoTime() - t0;       
        System.out.printf("Collections: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));


        //Stream approach
        Stream<Integer> stream = sourceList.stream();       
        t0 = System.nanoTime();
        result = stream.filter(i -> i%2 == 0).map(i -> Math.sqrt(i)).collect(Collectors.toList());
        elapsed = System.nanoTime() - t0;       
        System.out.printf("Streams: Elapsed time:\t\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));


        //Parallel stream approach
        stream = sourceList.stream().parallel();        
        t0 = System.nanoTime();
        result = stream.filter(i -> i%2 == 0).map(i -> Math.sqrt(i)).collect(Collectors.toList());
        elapsed = System.nanoTime() - t0;       
        System.out.printf("Parallel streams: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));      
    }.

以下是雙核機器的結果:

    Collections: Elapsed time:        94338247 ns   (0,094338 seconds)
    Streams: Elapsed time:           201112924 ns   (0,201113 seconds)
    Parallel streams: Elapsed time:  357243629 ns   (0,357244 seconds)

對於這個特定的測試,流的速度大約是 collections 的兩倍,並且並行性沒有幫助(或者我使用它的方式錯誤?)。

問題:

  • 這個考試公平嗎? 我犯了什么錯誤嗎?
  • 流是否比 collections 慢? 有沒有人對此做出良好的正式基准?
  • 我應該爭取哪種方法?

更新結果。

按照@pveentjer 的建議,我在 JVM 預熱(1k 次迭代)之后運行了 1k 次測試:

    Collections: Average time:      206884437,000000 ns     (0,206884 seconds)
    Streams: Average time:           98366725,000000 ns     (0,098367 seconds)
    Parallel streams: Average time: 167703705,000000 ns     (0,167704 seconds)

在這種情況下,流的性能更高。 我想知道在運行時僅調用一次或兩次過濾 function 的應用程序中會觀察到什么。

  1. 除了使用迭代器從列表中間大量刪除之外,停止使用LinkedList

  2. 停止手動編寫基准測試代碼,使用JMH

適當的基准:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(StreamVsVanilla.N)
public class StreamVsVanilla {
    public static final int N = 10000;

    static List<Integer> sourceList = new ArrayList<>();
    static {
        for (int i = 0; i < N; i++) {
            sourceList.add(i);
        }
    }

    @Benchmark
    public List<Double> vanilla() {
        List<Double> result = new ArrayList<>(sourceList.size() / 2 + 1);
        for (Integer i : sourceList) {
            if (i % 2 == 0){
                result.add(Math.sqrt(i));
            }
        }
        return result;
    }

    @Benchmark
    public List<Double> stream() {
        return sourceList.stream()
                .filter(i -> i % 2 == 0)
                .map(Math::sqrt)
                .collect(Collectors.toCollection(
                    () -> new ArrayList<>(sourceList.size() / 2 + 1)));
    }
}

結果:

Benchmark                   Mode   Samples         Mean   Mean error    Units
StreamVsVanilla.stream      avgt        10       17.588        0.230    ns/op
StreamVsVanilla.vanilla     avgt        10       10.796        0.063    ns/op

正如我預期的那樣,流實現相當慢。 JIT 能夠內聯所有 lambda 內容,但不會產生像 vanilla 版本那樣完美簡潔的代碼。

通常,Java 8 流並不神奇。 他們無法加速已經很好實現的東西(可能是簡單的迭代或 Java 5 的 for-each 語句被Iterable.forEach()Collection.removeIf()調用取代)。 流更多的是關於編碼的便利性和安全性。 方便——速度權衡在這里起作用。

1) 使用基准測試,您看到的時間少於 1 秒。 這意味着副作用可能會對您的結果產生很大影響。 所以,我把你的任務增加了 10 倍

    int max = 10_000_000;

並運行您的基准測試。 我的結果:

Collections: Elapsed time:   8592999350 ns  (8.592999 seconds)
Streams: Elapsed time:       2068208058 ns  (2.068208 seconds)
Parallel streams: Elapsed time:  7186967071 ns  (7.186967 seconds)

沒有編輯( int max = 1_000_000 )結果是

Collections: Elapsed time:   113373057 ns   (0.113373 seconds)
Streams: Elapsed time:       135570440 ns   (0.135570 seconds)
Parallel streams: Elapsed time:  104091980 ns   (0.104092 seconds)

這就像你的結果:流比收集慢。 結論:流初始化/值傳輸花費了很多時間。

2)增加任務流后變得更快(沒關系),但並行流仍然太慢。 怎么了? 注意:你的命令中有collect(Collectors.toList()) 收集到單個收集本質上會在並發執行的情況下引入性能瓶頸和開銷。 可以通過替換來估計間接費用的相對成本

collecting to collection -> counting the element count

對於流,它可以通過collect(Collectors.counting()) 我得到了結果:

Collections: Elapsed time:   41856183 ns    (0.041856 seconds)
Streams: Elapsed time:       546590322 ns   (0.546590 seconds)
Parallel streams: Elapsed time:  1540051478 ns  (1.540051 seconds)

這是一項艱巨的任務! ( int max = 10000000 )結論:收集物品花費了大部分時間。 最慢的部分是添加到列表中。 順便說一句,簡單的ArrayList用於Collectors.toList()

    public static void main(String[] args) {
    //Calculating square root of even numbers from 1 to N       
    int min = 1;
    int max = 10000000;

    List<Integer> sourceList = new ArrayList<>();
    for (int i = min; i < max; i++) {
        sourceList.add(i);
    }

    List<Double> result = new LinkedList<>();


    //Collections approach
    long t0 = System.nanoTime();
    long elapsed = 0;
    for (Integer i : sourceList) {
        if(i % 2 == 0){
            result.add( doSomeCalculate(i));
        }
    }
    elapsed = System.nanoTime() - t0;       
    System.out.printf("Collections: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));


    //Stream approach
    Stream<Integer> stream = sourceList.stream();       
    t0 = System.nanoTime();
    result = stream.filter(i -> i%2 == 0).map(i -> doSomeCalculate(i))
            .collect(Collectors.toList());
    elapsed = System.nanoTime() - t0;       
    System.out.printf("Streams: Elapsed time:\t\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));


    //Parallel stream approach
    stream = sourceList.stream().parallel();        
    t0 = System.nanoTime();
    result = stream.filter(i -> i%2 == 0).map(i ->  doSomeCalculate(i))
            .collect(Collectors.toList());
    elapsed = System.nanoTime() - t0;       
    System.out.printf("Parallel streams: Elapsed time:\t %d ns \t(%f seconds)%n", elapsed, elapsed / Math.pow(10, 9));      
}

static double doSomeCalculate(int input) {
    for(int i=0; i<100000; i++){
        Math.sqrt(i+input);
    }
    return Math.sqrt(input);
}

我稍微更改了代碼,在我的 8 核 mac book pro 上運行,我得到了一個合理的結果:

Collections: Elapsed time:      1522036826 ns   (1.522037 seconds)
Streams: Elapsed time:          4315833719 ns   (4.315834 seconds)
Parallel streams: Elapsed time:  261152901 ns   (0.261153 seconds)

對於您要執行的操作,無論如何我都不會使用常規的 Java api。 有大量的裝箱/拆箱正在進行,因此存在巨大的性能開銷。

我個人認為很多 API 設計都是垃圾,因為它們會產生大量的對象垃圾。

嘗試使用 double/int 的原始數組並嘗試單線程進行,看看性能如何。

PS:您可能想看看 JMH 來處理基准測試。 它處理了一些典型的陷阱,比如預熱 JVM。

Java 8 和 Java 11 的有趣結果。我使用了由 leventov 提供的代碼,幾乎沒有修改:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(BenchmarkMain.N)
public class BenchmarkMain {

    public static final int N = 10000;

    static List<Integer> sourceList = new ArrayList<>();
    static {
        for (int i = 0; i < N; i++) {
            sourceList.add(i);
        }
    }

    @Benchmark
    public List<Double> vanilla() {
        List<Double> result = new ArrayList<>(sourceList.size() / 2 + 1);
        for (Integer i : sourceList) {
            if (i % 2 == 0){
                result.add(Math.sqrt(i));
            }
        }
        return result;
    }

    @Benchmark
    public List<Double> stream() {
        return sourceList.stream()
                .filter(i -> i % 2 == 0)
                .map(Math::sqrt)
                .collect(Collectors.toCollection(
                    () -> new ArrayList<>(sourceList.size() / 2 + 1)));
    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException {
        org.openjdk.jmh.Main.main(args);

    }

}

爪哇 8:

# JMH version: 1.31
# VM version: JDK 1.8.0_262, OpenJDK 64-Bit Server VM, 25.262-b19
# VM invoker: /opt/jdk1.8.0_262/jre/bin/java
# VM options: <none>
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
...
Benchmark              Mode  Cnt   Score   Error  Units
BenchmarkMain.stream   avgt   25  10.680 ± 0.744  ns/op
BenchmarkMain.vanilla  avgt   25   6.490 ± 0.159  ns/op

爪哇11:

# JMH version: 1.31
# VM version: JDK 11.0.2, OpenJDK 64-Bit Server VM, 11.0.2+9
# VM invoker: /opt/jdk-11.0.2/bin/java
# VM options: <none>
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
...
Benchmark              Mode  Cnt  Score   Error  Units
BenchmarkMain.stream   avgt   25  5.521 ± 0.057  ns/op
BenchmarkMain.vanilla  avgt   25  7.359 ± 0.118  ns/op

二手 Java 17 我的結果

Collections: Elapsed time:109585000 ns  (0.109585 seconds)
Streams: Elapsed time:42179700 ns   (0.042180 seconds)
Parallel streams: Elapsed time:76177100 ns  (0.076177 seconds)

而不是 LinkedList 使用List.of結果改變

Collections: Elapsed time:49681300 ns   (0.049681 seconds)
Streams: Elapsed time:38930300 ns   (0.038930 seconds)
Parallel streams: Elapsed time:49190500 ns  (0.049191 seconds)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM