简体   繁体   English

Java strictfp 修饰符对现代 CPU 有什么影响吗?

[英]Does Java strictfp modifier have any effect on modern CPUs?

I know the meaning of the strictfp modifier on methods (and on classes), according to the JLS:根据 JLS,我知道方法(和类)上的strictfp修饰符的含义:

JLS 8.4.3.5, strictfp methods: JLS 8.4.3.5,strictfp 方法:

The effect of the strictfp modifier is to make all float or double expressions within the method body be explicitly FP-strict (§15.4). strictfp 修饰符的作用是使方法主体中的所有 float 或 double 表达式明确为 FP-strict(第 15.4 节)。

JLS 15.4 FP-strict expressions: JLS 15.4 FP-strict 表达式:

Within an FP-strict expression, all intermediate values must be elements of the float value set or the double value set, implying that the results of all FP-strict expressions must be those predicted by IEEE 754 arithmetic on operands represented using single and double formats.在 FP-strict 表达式中,所有中间值必须是 float 值集或 double 值集的元素,这意味着所有 FP-strict 表达式的结果必须是 IEEE 754 算术对使用单双格式表示的操作数预测的结果.

Within an expression that is not FP-strict, some leeway is granted for an implementation to use an extended exponent range to represent intermediate results;在非 FP 严格的表达式中,为实现使用扩展的指数范围来表示中间结果提供了一些余地; the net effect, roughly speaking, is that a calculation might produce "the correct answer" in situations where exclusive use of the float value set or double value set might result in overflow or underflow.粗略地说,净效果是在独占使用浮点值集或双值集可能导致上溢或下溢的情况下,计算可能会产生“正确答案”。

I've been trying to come up with a way to get an actual difference between an expression in a strictfp method and one that is not strictfp .我一直在试图想出一个办法让一个表达式之间的实际差异strictfp方法和一个不strictfp I've tried this on two laptops, one with a Intel Core i3 CPU and one with an Intel Core i7 CPU.我已经在两台笔记本电脑上尝试过这个,一台配备 Intel Core i3 CPU,另一台配备 Intel Core i7 CPU。 And I can't get any difference.我看不出任何区别。

A lot of posts suggest that native floating point, not using strictfp , could be using 80-bit floating point numbers, and have extra representable numbers below the smallest possible java double (closest to zero) or above the highest possible 64-bit java double.很多帖子表明,原生浮点数,不使用strictfp ,可以使用 80 位浮点数,并且在最小可能的 java double(最接近零)以下或在可能的最高 64 位 java double 以上有额外的可表示数字.

I tried this code below with and without a strictfp modifier and it gives exactly the same results.我在使用和不使用strictfp修饰符的情况下尝试了下面的这段代码,它给出了完全相同的结果。

public static strictfp void withStrictFp() {
    double v = Double.MAX_VALUE;
    System.out.println(v * 1.0000001 / 1.0000001);
    v = Double.MIN_VALUE;
    System.out.println(v / 2 * 2);
}

Actually, I assume that any difference would only show up when the code is compiled to assembly so I am running it with the -Xcomp JVM argument.实际上,我认为只有在将代码编译为程序集时才会出现任何差异,因此我使用-Xcomp JVM 参数运行它。 But no difference.但没有区别。

I foundanother post explaining how you can get the assembly code generated by HotSpot ( OpenJDK documentation ).我找到了另一篇文章,解释了如何获取 HotSpot 生成的汇编代码( OpenJDK 文档)。 I'm running my code with java -Xcomp -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly .我正在使用java -Xcomp -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly运行我的代码。 The first expression ( v * 1.0000001 / 1.0000001 ) with the strictfp modifier, and also the same without it, is compiled to:第一个表达式 ( v * 1.0000001 / 1.0000001 ) 带有strictfp修饰符,没有它也一样,被编译为:

  0x000000010f10a0a9: movsd  -0xb1(%rip),%xmm0        # 0x000000010f10a000
                                                ;   {section_word}
  0x000000010f10a0b1: mulsd  -0xb1(%rip),%xmm0        # 0x000000010f10a008
                                                ;   {section_word}
  0x000000010f10a0b9: divsd  -0xb1(%rip),%xmm0        # 0x000000010f10a010
                                                ;   {section_word}

There is nothing in that code that truncates the result of each step to 64 bits like I had expected.该代码中没有任何内容像我预期的那样将每个步骤的结果截断为 64 位。 Looking up the documentation of movsd , mulsd and divsd , they all mention that these (SSE) instructions operate on 64-bit floating point values, not 80-bit values as I expected. 查看movsdmulsddivsd 文档,他们都提到这些(SSE)指令对 64 位浮点值进行操作,而不是我预期的 80 位值。 So it seems logical that the double value-set that these instructions operate on is already the IEEE 754 value set, so there would be no difference between having strictfp and not having it.因此,这些指令操作的双值集已经是 IEEE 754 值集似乎是合乎逻辑的,因此使用strictfp和不使用它之间没有区别。

My questions are:我的问题是:

  1. Is this analysis correct?这个分析正确吗? I don't use Intel assembly very often so I'm not confident of my conclusion.我不经常使用英特尔组装,所以我对我的结论没有信心。
  2. Is there any (other) modern CPU architecture (that has a JVM) for which there is a difference between operation with and without the strictfp modifier?是否有任何(其他)现代 CPU 架构(具有 JVM)在使用和不使用strictfp修饰符的操作之间存在差异?

If by “modern” you mean processors supporting the sort of SSE2 instructions that you quote in your question as produced by your compiler ( mulsd , …), then the answer is no, strictfp does not make a difference, because the instruction set does not allow to take advantage of the absence of strictfp .如果“现代”是指处理器支持您在问题中引用的由编译器生成的 SSE2 指令( mulsd ,...),那么答案是否定的, strictfp没有区别,因为指令集没有允许利用没有strictfp优势。 The available instructions are already optimal to compute to the precise specifications of strictfp .可用的指令已经优化计算,以精确规格strictfp In other words, on that kind of modern CPU, you get strictfp semantics all the time for the same price.换句话说,在这种现代 CPU 上,您始终可以以相同的价格获得strictfp语义。

If by “modern” you mean the historical 387 FPU, then it is possible to observe a difference if an intermediate computation would overflow or underflow in strictfp mode (the difference being that it may not overflow or, on underflow, keep more precision bits than expected).如果“现代”是指历史上的 387 FPU,那么如果中间计算在strictfp模式下溢出或下溢,则可以观察到差异(不同之处在于它可能不会溢出,或者在下溢时保留比预期的)。

A typical strictfp computation compiled for the 387 will look like the assembly in this answer , with well-placed multiplications by well-chosen powers of two to make underflow behave the same as in IEEE 754 binary64.为 387 编译的典型strictfp计算看起来像这个答案中的程序集,通过精心选择的 2 次幂进行精心放置的乘法,以使下溢的行为与 IEEE 754 binary64 中的相同。 A round-trip of the result through a 64-bit memory location then takes care of overflows.结果通过 64 位内存位置的往返会处理溢出。

The same computation compiled without strictfp would produce one 387 instruction per basic operation, for instance just the multiplication instruction fmulp for a source-level multiplication.在没有strictfp情况下编译的相同计算将在每个基本操作中产生一条387 条指令,例如对于源级乘法仅产生乘法指令fmulp (The 387 would have been configured to use the same significand width as binary64, 53 bits, at the beginning of the program.) (在程序开始时,387 将被配置为使用与 binary64 相同的有效位宽度,53 位。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM