简体   繁体   English


[英]Why is fast inverse square root so odd and slow on Java?

I'm trying to implement Fast Inverse Square Root on java in order to speed up vector normalization. 我正在尝试在java上实现Fast Inverse Square Root ,以加快向量规范化。 However, when I implement the single-precision version in Java, I get speeds about the same as 1F / (float)Math.sqrt() at first, then quickly drops to half the speed. 但是,当我在Java中实现单精度版本时,我首先获得与1F / (float)Math.sqrt()大致相同的速度,然后迅速降低到速度的一半。 This is interesting, because while Math.sqrt uses (I presume) a native method, this involves floating point division, which I've heard is really slow. 这很有意思,因为虽然Math.sqrt使用(我推测)一个本机方法,但这涉及浮点除法,我听说它实在很慢。 My code for computing the numbers is as follows: 我计算数字的代码如下:

public static float fastInverseSquareRoot(float x){
    float xHalf = 0.5F * x;
    int temp = Float.floatToRawIntBits(x);
    temp = 0x5F3759DF - (temp >> 1);
    float newX = Float.intBitsToFloat(temp);
    newX = newX * (1.5F - xHalf * newX * newX);
    return newX;

Using a short program I've written to iterate each 16 million times, then aggregate results, and repeat, I get results like this: 使用一个简短的程序,我写了迭代每1600万次,然后汇总结果,重复,我得到这样的结果:

1F / Math.sqrt() took 65209490 nanoseconds.
Fast Inverse Square Root took 65456128 nanoseconds.
Fast Inverse Square Root was 0.378224 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 64131293 nanoseconds.
Fast Inverse Square Root took 26214534 nanoseconds.
Fast Inverse Square Root was 59.123647 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 27312205 nanoseconds.
Fast Inverse Square Root took 56234714 nanoseconds.
Fast Inverse Square Root was 105.895914 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 26493281 nanoseconds.
Fast Inverse Square Root took 56004783 nanoseconds.
Fast Inverse Square Root was 111.392402 percent slower than 1F / Math.sqrt()

I consistently get numbers which are about the same speed for both, followed by an iteration where Fast Inverse Square Root saves about 60 percent of the time required by 1F / Math.sqrt() , followed by several iterations which take about twice as long for Fast Inverse Square Root to run as the control. 我总是得到两个速度大致相同的数字,然后是一个迭代,其中快速反向平方根节省了大约60%的1F / Math.sqrt()所需的时间,接着是几次迭代,大约需要两倍的时间快速反向平方根作为控件运行。 I'm confused why FISR would go from Same -> 60 percent faster -> 100 percent slower, and it happens every time I run my program. 我很困惑为什么FISR会从同一时间开始 - > 60%更快 - >慢100%,每次运行我的程序时都会发生这种情况。

EDIT: The above data is when I run it in eclipse. 编辑:上面的数据是我在eclipse中运行它。 When I run the program with javac/java I get completely different data: 当我用javac/java运行程序时,我得到了完全不同的数据:

1F / Math.sqrt() took 57870498 nanoseconds.
Fast Inverse Square Root took 88206794 nanoseconds.
Fast Inverse Square Root was 52.421004 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 54982400 nanoseconds.
Fast Inverse Square Root took 83777562 nanoseconds.
Fast Inverse Square Root was 52.371599 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 21115822 nanoseconds.
Fast Inverse Square Root took 76705152 nanoseconds.
Fast Inverse Square Root was 263.259133 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 20159210 nanoseconds.
Fast Inverse Square Root took 80745616 nanoseconds.
Fast Inverse Square Root was 300.539585 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 21814675 nanoseconds.
Fast Inverse Square Root took 85261648 nanoseconds.
Fast Inverse Square Root was 290.845374 percent slower than 1F / Math.sqrt()

EDIT2: After a few responses, it seems the speed stabilizes after several iterations, but the number it stabilizes to is highly volatile. EDIT2:经过几次反应后,似乎速度在几次迭代后稳定下来,但它稳定的数字是高度不稳定的。 Anyone have any idea why? 任何人都知道为什么?

Here's my code (not exactly concise, but here's the whole thing): 这是我的代码(不完全简洁,但这里是整个事情):

public class FastInverseSquareRootTest {

    public static FastInverseSquareRootTest conductTest() {
        float result = 0F;
        long startTime, endTime, midTime;
        startTime = System.nanoTime();
        for (float x = 1F; x < 4_000_000F; x += 0.25F) {
            result = 1F / (float) Math.sqrt(x);
        midTime = System.nanoTime();
        for (float x = 1F; x < 4_000_000F; x += 0.25F) {
            result = fastInverseSquareRoot(x);
        endTime = System.nanoTime();
        return new FastInverseSquareRootTest(midTime - startTime, endTime
                - midTime);

    public static float fastInverseSquareRoot(float x) {
        float xHalf = 0.5F * x;
        int temp = Float.floatToRawIntBits(x);
        temp = 0x5F3759DF - (temp >> 1);
        float newX = Float.intBitsToFloat(temp);
        newX = newX * (1.5F - xHalf * newX * newX);
        return newX;

    public static void main(String[] args) throws Exception {
        for (int i = 0; i < 7; i++) {

    private long controlDiff;

    private long experimentalDiff;

    private double percentError;

    public FastInverseSquareRootTest(long controlDiff, long experimentalDiff) {
        this.experimentalDiff = experimentalDiff;
        this.controlDiff = controlDiff;
        this.percentError = 100D * (experimentalDiff - controlDiff)
                / controlDiff;

    public String toString() {
        StringBuilder sb = new StringBuilder();
        sb.append(String.format("1F / Math.sqrt() took %d nanoseconds.%n",
                "Fast Inverse Square Root took %d nanoseconds.%n",
                .format("Fast Inverse Square Root was %f percent %s than 1F / Math.sqrt()%n",
                        Math.abs(percentError), percentError > 0D ? "slower"
                                : "faster"));
        return sb.toString();

The JIT optimiser seems to have thrown the call to Math.sqrt away. JIT优化器似乎抛出了对Math.sqrt的调用。

With your unmodified code, I got 使用未经修改的代码,我得到了

1F / Math.sqrt() took 65358495 nanoseconds.
Fast Inverse Square Root took 77152791 nanoseconds.
Fast Inverse Square Root was 18,045544 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 52872498 nanoseconds.
Fast Inverse Square Root took 75242075 nanoseconds.
Fast Inverse Square Root was 42,308531 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 23386359 nanoseconds.
Fast Inverse Square Root took 73532080 nanoseconds.
Fast Inverse Square Root was 214,422951 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 23790209 nanoseconds.
Fast Inverse Square Root took 76254902 nanoseconds.
Fast Inverse Square Root was 220,530610 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 23885467 nanoseconds.
Fast Inverse Square Root took 74869636 nanoseconds.
Fast Inverse Square Root was 213,452678 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 23473514 nanoseconds.
Fast Inverse Square Root took 73063699 nanoseconds.
Fast Inverse Square Root was 211,260168 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 23738564 nanoseconds.
Fast Inverse Square Root took 71917013 nanoseconds.
Fast Inverse Square Root was 202,954353 percent slower than 1F / Math.sqrt()

consistently slower times for fastInverseSquareRoot , and the times for that are all in the same ball-park, while the Math.sqrt calls sped up considerably. fastInverseSquareRoot持续时间一直较慢,并且其时间都在同一个球场中,而Math.sqrt调用显着增加。

Changing the code so that the calls to Math.sqrt couldn't be avoided, 更改代码,以便无法避免对Math.sqrt的调用,

    for (float x = 1F; x < 4_000_000F; x += 0.25F) {
        result += 1F / (float) Math.sqrt(x);
    midTime = System.nanoTime();
    for (float x = 1F; x < 4_000_000F; x += 0.25F) {
        result -= fastInverseSquareRoot(x);
    endTime = System.nanoTime();
    if (result == 0) System.out.println("Wow!");

I got 我有

1F / Math.sqrt() took 184884684 nanoseconds.
Fast Inverse Square Root took 85298761 nanoseconds.
Fast Inverse Square Root was 53,863804 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 182183542 nanoseconds.
Fast Inverse Square Root took 83040574 nanoseconds.
Fast Inverse Square Root was 54,419278 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 165269658 nanoseconds.
Fast Inverse Square Root took 81922280 nanoseconds.
Fast Inverse Square Root was 50,431143 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 163272877 nanoseconds.
Fast Inverse Square Root took 81906141 nanoseconds.
Fast Inverse Square Root was 49,834815 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 165314846 nanoseconds.
Fast Inverse Square Root took 81124465 nanoseconds.
Fast Inverse Square Root was 50,927296 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 164079534 nanoseconds.
Fast Inverse Square Root took 80453629 nanoseconds.
Fast Inverse Square Root was 50,966689 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 162350821 nanoseconds.
Fast Inverse Square Root took 79854355 nanoseconds.
Fast Inverse Square Root was 50,813704 percent faster than 1F / Math.sqrt()

much slower times for Math.sqrt , and only moderately slower times for fastInverseSqrt (now it had to do a subtraction in each iteration). Math.sqrt时间很多 ,而fastInverseSqrt时间要慢fastInverseSqrt (现在它必须在每次迭代中进行减法)。

My output for the code posted is: 我发布的代码输出是:

1F / Math.sqrt() took 165769968 nanoseconds.
Fast Inverse Square Root took 251809517 nanoseconds.
Fast Inverse Square Root was 51.902977 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 162953919 nanoseconds.
Fast Inverse Square Root took 251212721 nanoseconds.
Fast Inverse Square Root was 54.161816 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 161524902 nanoseconds.
Fast Inverse Square Root took 36242909 nanoseconds.
Fast Inverse Square Root was 77.562030 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 162289014 nanoseconds.
Fast Inverse Square Root took 36552036 nanoseconds.
Fast Inverse Square Root was 77.477196 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 163157620 nanoseconds.
Fast Inverse Square Root took 36152720 nanoseconds.
Fast Inverse Square Root was 77.841844 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 162511997 nanoseconds.
Fast Inverse Square Root took 36426705 nanoseconds.
Fast Inverse Square Root was 77.585221 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 162302698 nanoseconds.
Fast Inverse Square Root took 36797410 nanoseconds.
Fast Inverse Square Root was 77.327912 percent faster than 1F / Math.sqrt()

It seems JIT kicked in, and the performaces boosted nearly tenfold. 似乎JIT开始了,表演提升了近十倍。 Hope someone with a better hold of JIT will come and explain this. 希望有更好地掌握JIT的人会来解释这一点。 My environment: Java 6, Eclipse. 我的环境:Java 6,Eclipse。

My jit had 2 steps of getting faster: first is probably algorithm optimizations and second could be assembly optimization. 我的jit有两个更快的步骤:第一个可能是算法优化,第二个可能是程序集优化。

1F / Math.sqrt() took 78202645 nanoseconds.
Fast Inverse Square Root took 79248400 nanoseconds.
Fast Inverse Square Root was 1,337237 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 76856008 nanoseconds.
Fast Inverse Square Root took 24788247 nanoseconds.
Fast Inverse Square Root was 67,747158 percent faster than 1F / Math.sqrt()

1F / Math.sqrt() took 24162119 nanoseconds.
Fast Inverse Square Root took 70651968 nanoseconds.
Fast Inverse Square Root was 192,407996 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 24163301 nanoseconds.
Fast Inverse Square Root took 70598983 nanoseconds.
Fast Inverse Square Root was 192,174414 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 24201621 nanoseconds.
Fast Inverse Square Root took 70667344 nanoseconds.
Fast Inverse Square Root was 191,994259 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 24219835 nanoseconds.
Fast Inverse Square Root took 70698568 nanoseconds.
Fast Inverse Square Root was 191,903591 percent slower than 1F / Math.sqrt()

1F / Math.sqrt() took 24231663 nanoseconds.
Fast Inverse Square Root took 70633991 nanoseconds.
Fast Inverse Square Root was 191,494608 percent slower than 1F / Math.sqrt()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM