简体   繁体   English

为什么Java可选性能随着链式呼叫的数量而增加?

[英]Why Java Optional performance increase with number of chained calls?

I was recently asked about the performance of java 8 Optional. 我最近被问及java 8 Optional的性能。 After some searching, I found this question and several blog posts, with contradicting answers. 经过一番搜索后,我发现了这个问题和几篇博文,但答案相互矛盾。 So I benchmarked it using JMH and I don't understand my findings. 所以我用JMH对它进行了基准测试,我不明白我的发现。

Here is the gist of my benchmark code ( full code is available on GitHub): 以下是我的基准代码的要点( 完整代码可在GitHub上获得):

@State(Scope.Benchmark)
public class OptionalBenchmark {

  private Room room;

  @Param({ "empty", "small", "large", "full" })
  private String filling;

  @Setup
  public void setUp () {
    switch (filling) {
      case "empty":
        room = null;
        break;
      case "small":
        room = new Room(new Flat(new Floor(null)));
        break;
      case "large":
        room = new Room(new Flat(new Floor(new Building(new Block(new District(null))))));
        break;
      case "full":
        room = new Room(new Flat(new Floor(new Building(new Block(new District(new City(new Country("France"))))))));
        break;
      default:
        throw new IllegalStateException("Unsupported filling.");
    }
  }

  @Benchmark
  public String nullChecks () {
    if (room == null) {
      return null;
    }

    Flat flat = room.getFlat();
    if (flat == null) {
      return null;
    }

    Floor floor = flat.getFloor();
    if (floor == null) {
      return null;
    }

    Building building = floor.getBuilding();
    if (building == null) {
      return null;
    }

    Block block = building.getBlock();
    if (block == null) {
      return null;
    }

    District district = block.getDistrict();
    if (district == null) {
      return null;
    }

    City city = district.getCity();
    if (city == null) {
      return null;
    }

    Country country = city.getCountry();
    if (country == null) {
      return null;
    }

    return country.getName();
  }

  @Benchmark
  public String optionalsWithMethodRefs () {
    return Optional.ofNullable (room)
        .map (Room::getFlat)
        .map (Flat::getFloor)
        .map (Floor::getBuilding)
        .map (Building::getBlock)
        .map (Block::getDistrict)
        .map (District::getCity)
        .map (City::getCountry)
        .map (Country::getName)
        .orElse (null);
  }

  @Benchmark
  public String optionalsWithLambdas () {
    return Optional.ofNullable (room)
        .map (room -> room.getFlat ())
        .map (flat -> flat.getFloor ())
        .map (floor -> floor.getBuilding ())
        .map (building -> building.getBlock ())
        .map (block -> block.getDistrict ())
        .map (district -> district.getCity ())
        .map (city -> city.getCountry ())
        .map (country -> country.getName ())
        .orElse (null);
  }

}

And the results I got were: 我得到的结果是:

Benchmark                                  (filling)   Mode  Cnt           Score         Error  Units
OptionalBenchmark.nullChecks                   empty  thrpt  200   468835378.093 ±  895576.864  ops/s
OptionalBenchmark.nullChecks                   small  thrpt  200   306602013.907 ±  136966.520  ops/s
OptionalBenchmark.nullChecks                   large  thrpt  200   259996142.619 ±  307584.215  ops/s
OptionalBenchmark.nullChecks                    full  thrpt  200   275954974.981 ± 4154597.959  ops/s
OptionalBenchmark.optionalsWithLambdas         empty  thrpt  200   460491457.335 ±  322920.650  ops/s
OptionalBenchmark.optionalsWithLambdas         small  thrpt  200    98604468.453 ±   68320.074  ops/s
OptionalBenchmark.optionalsWithLambdas         large  thrpt  200    67648427.470 ±  206810.285  ops/s
OptionalBenchmark.optionalsWithLambdas          full  thrpt  200   167124820.392 ± 1229924.561  ops/s
OptionalBenchmark.optionalsWithMethodRefs      empty  thrpt  200   460690135.554 ±  273853.568  ops/s
OptionalBenchmark.optionalsWithMethodRefs      small  thrpt  200    98639064.680 ±   56848.805  ops/s
OptionalBenchmark.optionalsWithMethodRefs      large  thrpt  200    68138436.113 ±  158409.539  ops/s
OptionalBenchmark.optionalsWithMethodRefs       full  thrpt  200   169603006.971 ±   52646.423  ops/s

First of all, when given a null reference, Optional and null checks behave pretty much the same. 首先,当给出空引用时,Optional和null检查的行为几乎相同。 I guess this is because there is only one instance of Optional.empty () , so any .map () method call on it just returns itself. 我想这是因为只有一个Optional.empty ()实例,所以对它的任何.map ()方法调用都会返回它自己。

When the given object is non-null and contains a chain of non-null attributes, however, a new Optional has to be instantiated on each call to .map () . 但是,当给定对象为非null并且包含一系列非null属性时,必须在每次调用.map ()实例化一个新的Optional。 Hence, performance degrade much more quickly than with null checks. 因此,与空检查相比,性能下降得更快。 Makes sense. 说得通。 Expect for my full filling, where the performance all of a sudden increase. 期待我的full表现,表现突然增加。 So what is the magic going on here? 那么这里的魔力是什么? Am I doing something wrong in my benchmark? 我在基准测试中做错了吗?

Edit 编辑

The parameters from my first run were the default from JMH: each benchmark was ran in 10 different forks, with 20 warmup iterations of 1s each, and then 20 measurement iterations of 1s each. 我第一次运行的参数是JMH的默认参数:每个基准测试在10个不同的分支中运行,20个预热迭代,每个1s,然后20个测量迭代,每个1s。 I believe those value are sane, since I trust the libraries I use. 我相信这些价值是理智的,因为我相信我使用的库。 However, since I was told I wasn't warming up enough, here is the result of a longer test (200 warmup iterations and 200 measurement iteration for each of the 10 forks): 然而,由于我被告知我没有充分预热,这是更长时间测试的结果(200个预热迭代和10个分叉中的每一个的200次测量迭代):

# JMH version: 1.19
# VM version: JDK 1.8.0_152, VM 25.152-b16
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 200 iterations, 1 s each
# Measurement: 200 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time

# Run complete. Total time: 17:49:25

Benchmark                                  (filling)   Mode   Cnt           Score         Error  Units
OptionalBenchmark.nullChecks                   empty  thrpt  2000   471803721.972 ±  116120.114  ops/s
OptionalBenchmark.nullChecks                   small  thrpt  2000   289181482.246 ± 3967502.916  ops/s
OptionalBenchmark.nullChecks                   large  thrpt  2000   260222478.406 ±  105074.121  ops/s
OptionalBenchmark.nullChecks                    full  thrpt  2000   282487728.710 ±   71214.637  ops/s
OptionalBenchmark.optionalsWithLambdas         empty  thrpt  2000   460931830.242 ±  335263.946  ops/s
OptionalBenchmark.optionalsWithLambdas         small  thrpt  2000    98688943.879 ±   20485.863  ops/s
OptionalBenchmark.optionalsWithLambdas         large  thrpt  2000    67262330.106 ±   50465.262  ops/s
OptionalBenchmark.optionalsWithLambdas          full  thrpt  2000   168070919.770 ±  352435.666  ops/s
OptionalBenchmark.optionalsWithMethodRefs      empty  thrpt  2000   460998599.579 ±   85063.337  ops/s
OptionalBenchmark.optionalsWithMethodRefs      small  thrpt  2000    98707338.408 ±   17231.648  ops/s
OptionalBenchmark.optionalsWithMethodRefs      large  thrpt  2000    68052673.021 ±   55285.427  ops/s
OptionalBenchmark.optionalsWithMethodRefs       full  thrpt  2000   169259067.479 ±  174402.212  ops/s

As you can see, we have almost the same figures. 如您所见,我们的数字几乎相同。

Even such a powerful tool like JMH is not able to save from all benchmarking pitfalls. 即使像JMH这样强大的工具也无法从所有基准测试陷阱中拯救出来。 I've found two different issues with this benchmark. 我发现这个基准有两个不同的问题。

1. 1。

HotSpot JIT compiler speculatively optimizes code basing on runtime profile. HotSpot JIT编译器根据运行时配置文件推测性地优化代码。 In the given "full" scenario Optional never sees null values. 在给定的“完整”场景中, Optional永远不会看到null值。 That's why Optional.ofNullable method (also called by Optional.map ) happens to be optimized exclusively for non-null path which constructs a new non-empty Optional . 这就是为什么Optional.ofNullable方法(也由Optional.map调用)碰巧仅针对构造新的非空Optional非null路径进行优化的原因。 In this case JIT is able to eliminate all short-lived allocations and perform all map operations without intermediate objects. 在这种情况下,JIT能够消除所有短期分配并执行所有map操作而无需中间对象。

public static <T> Optional<T> ofNullable(T value) {
    return value == null ? empty() : of(value);
}

In "small" and "large" scenarios the mapping sequence finally ends with Optional.empty() . 在“小”和“大”场景中,映射序列最终以Optional.empty()结束。 That is, both branches of ofNullable method are compiled, and JIT is no longer able to eliminate allocations of intermediate Optional objects - data flow graph appears to be too complex for Escape Analysis to succeed. 也就是说, ofNullable方法的两个分支都被编译,并且JIT不再能够消除中间Optional对象的分配 - 数据流图对于Escape Analysis来说似乎太复杂了。

Check it by running JMH with -prof gc , and you'll see that "small" allocates 48 bytes (3 Optionals) per iteration, "large" allocates 96 bytes (6 Optionals), and "full" allocates nothing. 通过使用-prof gc运行JMH来检查它,你会看到“small”每次迭代分配48个字节(3个Optionals),“large”分配96个字节(6个Optionals),“full”不分配。

Benchmark                                                      (filling)  Mode  Cnt     Score     Error   Units
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm      empty  avgt    5    ≈ 10⁻⁶              B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm      small  avgt    5    48,000 ±   0,001    B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm      large  avgt    5    96,000 ±   0,001    B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm       full  avgt    5    ≈ 10⁻⁵              B/op

If you replace new Country("France") with new Country(null) , the opimization will also break, and "full" scenario will become expectedly slower than "small" and "large". 如果用new Country(null)替换new Country("France") ,则opimization也将中断,并且“full”场景将变得比“small”和“large”慢。

Alternatively, the following dummy loop added to setUp will also prevent from overoptimizing ofNullable making the benchmark results more realistic. 或者,添加到setUp的以下虚拟循环也将阻止对ofNullable的过度ofNullable使基准测试结果更加真实。

    for (int i = 0; i < 1000; i++) {
        Optional.ofNullable(null);
    }

2. 2。

Surprisingly, nullChecks benchmark also appears faster in "full" scenario. 令人惊讶的是, nullChecks基准测试在“完整”场景中也显得更快。 The reason here is class initialization barriers. 这里的原因是类初始化障碍。 Note that only "full" case initializes all related classes. 请注意,只有“完整”大小写才会初始化所有相关类。 In "small" and "large" cases nullChecks method refers to some classes that are not yet initialized. 在“小”和“大”情况下, nullChecks方法指的是一些尚未初始化的类。 This prevents from compiling nullChecks efficiently. 这可以防止有效地编译nullChecks

If you explicitly initialize all the classes in setUp , eg by creating a dummy object, then "empty", "small" and "large" scenarios of nullChecks will become faster. 如果你显式初始化setUp所有类,例如通过创建一个虚拟对象,那么nullChecks “空”,“小”和“大”场景将变得更快。

Room dummy = new Room(new Flat(new Floor(new Building(new Block(new District(new City(new Country("France"))))))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM