[英]Why Java Optional performance increase with number of chained calls?
I was recently asked about the performance of java 8 Optional. 我最近被问及java 8 Optional的性能。 After some searching, I found this question and several blog posts, with contradicting answers.
经过一番搜索后,我发现了这个问题和几篇博文,但答案相互矛盾。 So I benchmarked it using JMH and I don't understand my findings.
所以我用JMH对它进行了基准测试,我不明白我的发现。
Here is the gist of my benchmark code ( full code is available on GitHub): 以下是我的基准代码的要点( 完整代码可在GitHub上获得):
@State(Scope.Benchmark)
public class OptionalBenchmark {
private Room room;
@Param({ "empty", "small", "large", "full" })
private String filling;
@Setup
public void setUp () {
switch (filling) {
case "empty":
room = null;
break;
case "small":
room = new Room(new Flat(new Floor(null)));
break;
case "large":
room = new Room(new Flat(new Floor(new Building(new Block(new District(null))))));
break;
case "full":
room = new Room(new Flat(new Floor(new Building(new Block(new District(new City(new Country("France"))))))));
break;
default:
throw new IllegalStateException("Unsupported filling.");
}
}
@Benchmark
public String nullChecks () {
if (room == null) {
return null;
}
Flat flat = room.getFlat();
if (flat == null) {
return null;
}
Floor floor = flat.getFloor();
if (floor == null) {
return null;
}
Building building = floor.getBuilding();
if (building == null) {
return null;
}
Block block = building.getBlock();
if (block == null) {
return null;
}
District district = block.getDistrict();
if (district == null) {
return null;
}
City city = district.getCity();
if (city == null) {
return null;
}
Country country = city.getCountry();
if (country == null) {
return null;
}
return country.getName();
}
@Benchmark
public String optionalsWithMethodRefs () {
return Optional.ofNullable (room)
.map (Room::getFlat)
.map (Flat::getFloor)
.map (Floor::getBuilding)
.map (Building::getBlock)
.map (Block::getDistrict)
.map (District::getCity)
.map (City::getCountry)
.map (Country::getName)
.orElse (null);
}
@Benchmark
public String optionalsWithLambdas () {
return Optional.ofNullable (room)
.map (room -> room.getFlat ())
.map (flat -> flat.getFloor ())
.map (floor -> floor.getBuilding ())
.map (building -> building.getBlock ())
.map (block -> block.getDistrict ())
.map (district -> district.getCity ())
.map (city -> city.getCountry ())
.map (country -> country.getName ())
.orElse (null);
}
}
And the results I got were: 我得到的结果是:
Benchmark (filling) Mode Cnt Score Error Units
OptionalBenchmark.nullChecks empty thrpt 200 468835378.093 ± 895576.864 ops/s
OptionalBenchmark.nullChecks small thrpt 200 306602013.907 ± 136966.520 ops/s
OptionalBenchmark.nullChecks large thrpt 200 259996142.619 ± 307584.215 ops/s
OptionalBenchmark.nullChecks full thrpt 200 275954974.981 ± 4154597.959 ops/s
OptionalBenchmark.optionalsWithLambdas empty thrpt 200 460491457.335 ± 322920.650 ops/s
OptionalBenchmark.optionalsWithLambdas small thrpt 200 98604468.453 ± 68320.074 ops/s
OptionalBenchmark.optionalsWithLambdas large thrpt 200 67648427.470 ± 206810.285 ops/s
OptionalBenchmark.optionalsWithLambdas full thrpt 200 167124820.392 ± 1229924.561 ops/s
OptionalBenchmark.optionalsWithMethodRefs empty thrpt 200 460690135.554 ± 273853.568 ops/s
OptionalBenchmark.optionalsWithMethodRefs small thrpt 200 98639064.680 ± 56848.805 ops/s
OptionalBenchmark.optionalsWithMethodRefs large thrpt 200 68138436.113 ± 158409.539 ops/s
OptionalBenchmark.optionalsWithMethodRefs full thrpt 200 169603006.971 ± 52646.423 ops/s
First of all, when given a null reference, Optional and null checks behave pretty much the same. 首先,当给出空引用时,Optional和null检查的行为几乎相同。 I guess this is because there is only one instance of
Optional.empty ()
, so any .map ()
method call on it just returns itself. 我想这是因为只有一个
Optional.empty ()
实例,所以对它的任何.map ()
方法调用都会返回它自己。
When the given object is non-null and contains a chain of non-null attributes, however, a new Optional has to be instantiated on each call to .map ()
. 但是,当给定对象为非null并且包含一系列非null属性时,必须在每次调用
.map ()
实例化一个新的Optional。 Hence, performance degrade much more quickly than with null checks. 因此,与空检查相比,性能下降得更快。 Makes sense.
说得通。 Expect for my
full
filling, where the performance all of a sudden increase. 期待我的
full
表现,表现突然增加。 So what is the magic going on here? 那么这里的魔力是什么? Am I doing something wrong in my benchmark?
我在基准测试中做错了吗?
The parameters from my first run were the default from JMH: each benchmark was ran in 10 different forks, with 20 warmup iterations of 1s each, and then 20 measurement iterations of 1s each. 我第一次运行的参数是JMH的默认参数:每个基准测试在10个不同的分支中运行,20个预热迭代,每个1s,然后20个测量迭代,每个1s。 I believe those value are sane, since I trust the libraries I use.
我相信这些价值是理智的,因为我相信我使用的库。 However, since I was told I wasn't warming up enough, here is the result of a longer test (200 warmup iterations and 200 measurement iteration for each of the 10 forks):
然而,由于我被告知我没有充分预热,这是更长时间测试的结果(200个预热迭代和10个分叉中的每一个的200次测量迭代):
# JMH version: 1.19
# VM version: JDK 1.8.0_152, VM 25.152-b16
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home/jre/bin/java
# VM options: <none>
# Warmup: 200 iterations, 1 s each
# Measurement: 200 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Run complete. Total time: 17:49:25
Benchmark (filling) Mode Cnt Score Error Units
OptionalBenchmark.nullChecks empty thrpt 2000 471803721.972 ± 116120.114 ops/s
OptionalBenchmark.nullChecks small thrpt 2000 289181482.246 ± 3967502.916 ops/s
OptionalBenchmark.nullChecks large thrpt 2000 260222478.406 ± 105074.121 ops/s
OptionalBenchmark.nullChecks full thrpt 2000 282487728.710 ± 71214.637 ops/s
OptionalBenchmark.optionalsWithLambdas empty thrpt 2000 460931830.242 ± 335263.946 ops/s
OptionalBenchmark.optionalsWithLambdas small thrpt 2000 98688943.879 ± 20485.863 ops/s
OptionalBenchmark.optionalsWithLambdas large thrpt 2000 67262330.106 ± 50465.262 ops/s
OptionalBenchmark.optionalsWithLambdas full thrpt 2000 168070919.770 ± 352435.666 ops/s
OptionalBenchmark.optionalsWithMethodRefs empty thrpt 2000 460998599.579 ± 85063.337 ops/s
OptionalBenchmark.optionalsWithMethodRefs small thrpt 2000 98707338.408 ± 17231.648 ops/s
OptionalBenchmark.optionalsWithMethodRefs large thrpt 2000 68052673.021 ± 55285.427 ops/s
OptionalBenchmark.optionalsWithMethodRefs full thrpt 2000 169259067.479 ± 174402.212 ops/s
As you can see, we have almost the same figures. 如您所见,我们的数字几乎相同。
Even such a powerful tool like JMH is not able to save from all benchmarking pitfalls. 即使像JMH这样强大的工具也无法从所有基准测试陷阱中拯救出来。 I've found two different issues with this benchmark.
我发现这个基准有两个不同的问题。
HotSpot JIT compiler speculatively optimizes code basing on runtime profile. HotSpot JIT编译器根据运行时配置文件推测性地优化代码。 In the given "full" scenario
Optional
never sees null
values. 在给定的“完整”场景中,
Optional
永远不会看到null
值。 That's why Optional.ofNullable
method (also called by Optional.map
) happens to be optimized exclusively for non-null path which constructs a new non-empty Optional
. 这就是为什么
Optional.ofNullable
方法(也由Optional.map
调用)碰巧仅针对构造新的非空Optional
非null路径进行优化的原因。 In this case JIT is able to eliminate all short-lived allocations and perform all map
operations without intermediate objects. 在这种情况下,JIT能够消除所有短期分配并执行所有
map
操作而无需中间对象。
public static <T> Optional<T> ofNullable(T value) {
return value == null ? empty() : of(value);
}
In "small" and "large" scenarios the mapping sequence finally ends with Optional.empty()
. 在“小”和“大”场景中,映射序列最终以
Optional.empty()
结束。 That is, both branches of ofNullable
method are compiled, and JIT is no longer able to eliminate allocations of intermediate Optional
objects - data flow graph appears to be too complex for Escape Analysis to succeed. 也就是说,
ofNullable
方法的两个分支都被编译,并且JIT不再能够消除中间Optional
对象的分配 - 数据流图对于Escape Analysis来说似乎太复杂了。
Check it by running JMH with -prof gc
, and you'll see that "small" allocates 48 bytes (3 Optionals) per iteration, "large" allocates 96 bytes (6 Optionals), and "full" allocates nothing. 通过使用
-prof gc
运行JMH来检查它,你会看到“small”每次迭代分配48个字节(3个Optionals),“large”分配96个字节(6个Optionals),“full”不分配。
Benchmark (filling) Mode Cnt Score Error Units
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm empty avgt 5 ≈ 10⁻⁶ B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm small avgt 5 48,000 ± 0,001 B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm large avgt 5 96,000 ± 0,001 B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm full avgt 5 ≈ 10⁻⁵ B/op
If you replace new Country("France")
with new Country(null)
, the opimization will also break, and "full" scenario will become expectedly slower than "small" and "large". 如果用
new Country(null)
替换new Country("France")
,则opimization也将中断,并且“full”场景将变得比“small”和“large”慢。
Alternatively, the following dummy loop added to setUp
will also prevent from overoptimizing ofNullable
making the benchmark results more realistic. 或者,添加到
setUp
的以下虚拟循环也将阻止对ofNullable
的过度ofNullable
使基准测试结果更加真实。
for (int i = 0; i < 1000; i++) {
Optional.ofNullable(null);
}
Surprisingly, nullChecks
benchmark also appears faster in "full" scenario. 令人惊讶的是,
nullChecks
基准测试在“完整”场景中也显得更快。 The reason here is class initialization barriers. 这里的原因是类初始化障碍。 Note that only "full" case initializes all related classes.
请注意,只有“完整”大小写才会初始化所有相关类。 In "small" and "large" cases
nullChecks
method refers to some classes that are not yet initialized. 在“小”和“大”情况下,
nullChecks
方法指的是一些尚未初始化的类。 This prevents from compiling nullChecks
efficiently. 这可以防止有效地编译
nullChecks
。
If you explicitly initialize all the classes in setUp
, eg by creating a dummy object, then "empty", "small" and "large" scenarios of nullChecks
will become faster. 如果你显式初始化
setUp
所有类,例如通过创建一个虚拟对象,那么nullChecks
“空”,“小”和“大”场景将变得更快。
Room dummy = new Room(new Flat(new Floor(new Building(new Block(new District(new City(new Country("France"))))))))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.