简体   繁体   English

Java 8 - 转换列表的最佳方法:map 还是 foreach?

[英]Java 8 - Best way to transform a list: map or foreach?

I have a list myListToParse where I want to filter the elements and apply a method on each element, and add the result in another list myFinalList .我有一个列表myListToParse ,我想在其中过滤元素并对每个元素应用一个方法,然后将结果添加到另一个列表myFinalList中。

With Java 8 I noticed that I can do it in 2 different ways.使用 Java 8 我注意到我可以通过 2 种不同的方式做到这一点。 I would like to know the more efficient way between them and understand why one way is better than the other one.我想知道它们之间更有效的方式,并理解为什么一种方式比另一种更好。

I'm open for any suggestion about a third way.我愿意接受有关第三种方式的任何建议。

Method 1:方法一:

myFinalList = new ArrayList<>();
myListToParse.stream()
        .filter(elt -> elt != null)
        .forEach(elt -> myFinalList.add(doSomething(elt)));

Method 2:方法二:

myFinalList = myListToParse.stream()
        .filter(elt -> elt != null)
        .map(elt -> doSomething(elt))
        .collect(Collectors.toList()); 

Don't worry about any performance differences, they're going to be minimal in this case normally.不要担心任何性能差异,在这种情况下它们通常会很小。

Method 2 is preferable because方法 2 更可取,因为

  1. it doesn't require mutating a collection that exists outside the lambda expression.它不需要改变存在于 lambda 表达式之外的集合。

  2. it's more readable because the different steps that are performed in the collection pipeline are written sequentially: first a filter operation, then a map operation, then collecting the result (for more info on the benefits of collection pipelines, see Martin Fowler's excellent article .)它更具可读性,因为在收集管道中执行的不同步骤是按顺序编写的:首先是过滤操作,然后是映射操作,然后收集结果(有关收集管道好处的更多信息,请参阅 Martin Fowler 的优秀文章。)

  3. you can easily change the way values are collected by replacing the Collector that is used.您可以通过替换所使用的Collector轻松更改收集值的方式。 In some cases you may need to write your own Collector , but then the benefit is that you can easily reuse that.在某些情况下,您可能需要编写自己的Collector ,但这样做的好处是您可以轻松地重用它。

I agree with the existing answers that the second form is better because it does not have any side effects and is easier to parallelise (just use a parallel stream).我同意现有的答案,即第二种形式更好,因为它没有任何副作用并且更容易并行化(只需使用并行流)。

Performance wise, it appears they are equivalent until you start using parallel streams.性能方面,在您开始使用并行流之前,它们似乎是等效的。 In that case, map will perform really much better.在这种情况下, map 的性能会好得多。 See below the micro benchmark results:请参阅下面的微基准测试结果:

Benchmark                         Mode  Samples    Score   Error  Units
SO28319064.forEach                avgt      100  187.310 ± 1.768  ms/op
SO28319064.map                    avgt      100  189.180 ± 1.692  ms/op
SO28319064.mapWithParallelStream  avgt      100   55,577 ± 0,782  ms/op

You can't boost the first example in the same manner because forEach is a terminal method - it returns void - so you are forced to use a stateful lambda.你不能以同样的方式提升第一个例子,因为forEach是一个终端方法 - 它返回 void - 所以你被迫使用有状态的 lambda。 But that is really a bad idea if you are using parallel streams .但是,如果您使用并行流,那确实是个坏主意

Finally note that your second snippet can be written in a sligthly more concise way with method references and static imports:最后请注意,您的第二个代码段可以通过方法引用和静态导入以更简洁的方式编写:

myFinalList = myListToParse.stream()
    .filter(Objects::nonNull)
    .map(this::doSomething)
    .collect(toList()); 

One of the main benefits of using streams is that it gives the ability to process data in a declarative way, that is, using a functional style of programming.使用流的主要好处之一是它提供了以声明方式处理数据的能力,即使用函数式编程风格。 It also gives multi-threading capability for free meaning there is no need to write any extra multi-threaded code to make your stream concurrent.它还免费提供多线程功能,这意味着无需编写任何额外的多线程代码来使您的流并发。

Assuming the reason you are exploring this style of programming is that you want to exploit these benefits then your first code sample is potentially not functional since the foreach method is classed as being terminal (meaning that it can produce side-effects).假设您探索这种编程风格的原因是您想利用这些好处,那么您的第一个代码示例可能无法正常工作,因为foreach方法被归类为终端方法(意味着它会产生副作用)。

The second way is preferred from functional programming point of view since the map function can accept stateless lambda functions.从函数式编程的角度来看,第二种方式是首选,因为 map 函数可以接受无状态的 lambda 函数。 More explicitly, the lambda passed to the map function should be更明确地说,传递给 map 函数的 lambda 应该是

  1. Non-interfering, meaning that the function should not alter the source of the stream if it is non-concurrent (eg ArrayList ).无干扰,这意味着如果它是非并发的(例如ArrayList ),该函数不应更改流的源。
  2. Stateless to avoid unexpected results when doing parallel processing (caused by thread scheduling differences).无状态以避免在进行并行处理时出现意外结果(由线程调度差异引起)。

Another benefit with the second approach is if the stream is parallel and the collector is concurrent and unordered then these characteristics can provide useful hints to the reduction operation to do the collecting concurrently.第二种方法的另一个好处是,如果流是并行的,并且收集器是并发且无序的,那么这些特征可以为减少操作提供有用的提示,以便同时进行收集。

If you use Eclipse Collections you can use the collectIf() method.如果您使用Eclipse Collections,您可以使用collectIf()方法。

MutableList<Integer> source =
    Lists.mutable.with(1, null, 2, null, 3, null, 4, null, 5);

MutableList<String> result = source.collectIf(Objects::nonNull, String::valueOf);

Assert.assertEquals(Lists.immutable.with("1", "2", "3", "4", "5"), result);

It evaluates eagerly and should be a bit faster than using a Stream.它急切地求值并且应该比使用 Stream 快一点。

Note: I am a committer for Eclipse Collections.注意:我是 Eclipse Collections 的提交者。

I prefer the second way.我更喜欢第二种方式。

When you use the first way, if you decide to use a parallel stream to improve performance, you'll have no control over the order in which the elements will be added to the output list by forEach .当您使用第一种方式时,如果您决定使用并行流来提高性能,您将无法控制forEach将元素添加到输出列表的顺序。

When you use toList , the Streams API will preserve the order even if you use a parallel stream.当您使用toList ,即使您使用并行流,Streams API 也会保留顺序。

There is a third option - using stream().toArray() - see comments under why didn't stream have a toList method .还有第三个选项 - 使用stream().toArray() - 请参阅为什么没有流有 toList 方法下的评论。 It turns out to be slower than forEach() or collect(), and less expressive.结果证明它比 forEach() 或 collect() 慢,并且表达能力较差。 It might be optimised in later JDK builds, so adding it here just in case.它可能会在以后的 JDK 构建中进行优化,因此在此处添加它以防万一。

assuming List<String>假设List<String>

    myFinalList = Arrays.asList(
            myListToParse.stream()
                    .filter(Objects::nonNull)
                    .map(this::doSomething)
                    .toArray(String[]::new)
    );

with a micro-micro benchmark, 1M entries, 20% nulls and simple transform in doSomething()在 doSomething() 中使用微基准测试、100 万个条目、20% 的空值和简单的转换

private LongSummaryStatistics benchmark(final String testName, final Runnable methodToTest, int samples) {
    long[] timing = new long[samples];
    for (int i = 0; i < samples; i++) {
        long start = System.currentTimeMillis();
        methodToTest.run();
        timing[i] = System.currentTimeMillis() - start;
    }
    final LongSummaryStatistics stats = Arrays.stream(timing).summaryStatistics();
    System.out.println(testName + ": " + stats);
    return stats;
}

the results are结果是

parallel:平行线:

toArray: LongSummaryStatistics{count=10, sum=3721, min=321, average=372,100000, max=535}
forEach: LongSummaryStatistics{count=10, sum=3502, min=249, average=350,200000, max=389}
collect: LongSummaryStatistics{count=10, sum=3325, min=265, average=332,500000, max=368}

sequential:顺序:

toArray: LongSummaryStatistics{count=10, sum=5493, min=517, average=549,300000, max=569}
forEach: LongSummaryStatistics{count=10, sum=5316, min=427, average=531,600000, max=571}
collect: LongSummaryStatistics{count=10, sum=5380, min=444, average=538,000000, max=557}

parallel without nulls and filter (so the stream is SIZED ): toArrays has the best performance in such case, and .forEach() fails with "indexOutOfBounds" on the recepient ArrayList, had to replace with .forEachOrdered()没有空值和过滤器的并行(因此流是SIZED ):在这种情况下,toArrays 具有最佳性能,并且.forEach()在接收方 ArrayList 上以“indexOutOfBounds”失败,必须替换为.forEachOrdered()

toArray: LongSummaryStatistics{count=100, sum=75566, min=707, average=755,660000, max=1107}
forEach: LongSummaryStatistics{count=100, sum=115802, min=992, average=1158,020000, max=1254}
collect: LongSummaryStatistics{count=100, sum=88415, min=732, average=884,150000, max=1014}

May be Method 3.可能是方法3。

I always prefer to keep logic separate.我总是喜欢将逻辑分开。

Predicate<Long> greaterThan100 = new Predicate<Long>() {
    @Override
    public boolean test(Long currentParameter) {
        return currentParameter > 100;
    }
};
        
List<Long> sourceLongList = Arrays.asList(1L, 10L, 50L, 80L, 100L, 120L, 133L, 333L);
List<Long> resultList = sourceLongList.parallelStream().filter(greaterThan100).collect(Collectors.toList());

If using 3rd Pary Libaries is ok cyclops-react defines Lazy extended collections with this functionality built in. For example we could simply write如果使用 3rd Pary Libaries 没问题,那么 cyclops -react定义了内置此功能的 Lazy 扩展集合。例如,我们可以简单地编写

ListX myListToParse; ListX myListToParse;

ListX myFinalList = myListToParse.filter(elt -> elt != null) .map(elt -> doSomething(elt)); ListX myFinalList = myListToParse.filter(elt -> elt != null) .map(elt -> doSomething(elt));

myFinalList is not evaluated until first access (and there after the materialized list is cached and reused). myFinalList 在第一次访问之前不会被评估(并且在物化列表被缓存和重用之后)。

[Disclosure I am the lead developer of cyclops-react] [披露我是独眼巨人反应的首席开发人员]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM