简体   繁体   English

C#方法100x慢,三次返回vs两次?

[英]C# Method 100x slower with three returns vs two?

I having a bit of strange behavior with a method I've made when I am trying to performance test it, basically if I comment-out/disable one of the returns in one of the if statements it go from 400ms to 4ms, almost like it is being compiled away, and not actually running the code, would kind of make sense if after commenting/disable one return, it only was return true or false left so it only had one option then I can see how the compiler would optimize it and always set it as a bool rather than running the code. 当我尝试对其进行性能测试时,我对我所做的方法有一些奇怪的行为,基本上如果我注释掉/禁用其中一个if语句中的一个返回它从400ms到4ms,几乎像它被编译掉了,并没有实际运行代码,如果在评论/禁用一个返回后,它只有返回true或false,所以它只有一个选项然后我可以看到编译器如何优化它并始终将其设置为bool而不是运行代码。

Anyone know what might be going on or have recommendation for a better way to run the test? 任何人都知道可能会发生什么或建议更好的方式来运行测试?

My Test Code: 我的测试代码:

Vec3 spherePos = new Vec3(43.7527, 75.9756, 0);
double sphereRadisSq = 50 * 50;
Vec3 rayPos = new Vec3(-5.32301, 5.97157, -112.983);
Vec3 rayDir = new Vec3(0.457841, 0.680324, 0.572312);

sw.Reset();
sw.Start();
bool res = false;
for (int i = 0; i < 10000000; i++)
{
   res = Intersect.RaySphereFast(rayPos, rayDir, spherePos, sphereRadisSq);
}      
sw.Stop();
Debug.Log($"testTime: {sw.ElapsedMilliseconds} ms");
Debug.Log(res);

And the Static Method: 和静态方法:

public static bool RaySphereFast(Vec3 _rp, Vec3 _rd, Vec3 _sp, double _srsq) 
{
    double rs = Vec3.DistanceFast(_rp, _sp);
    if (rs < _srsq)
    {
        return (true); // <-- When I disable this one
    }
    Vec3 p = Vec3.ProjectFast(_sp, _rp, _rd);
    double pr = Vec3.Dot(_rd, (p - _rp));
    if (pr < 0)
    {
        return (false); // <--  Or when I disable this one
    }
    double ps = Vec3.DistanceFast(p, _sp);
    if (ps < _srsq) 
    {
        return (true); // <--  Or when I disable this one
    }
    return (false);
}

Vec3 struct ( slimmed down ) : Vec3结构( 精简

public struct Vec3
{
    public Vec3(double _x, double _y, double _z)
    {
        x = _x;
        y = _y;
        z = _z;
    }

    public double x { get; }
    public double y { get; }
    public double z { get; }

    public static double DistanceFast(Vec3 _v0, Vec3 _v1) 
    {
        double x = (_v1.x - _v0.x);
        double y = (_v1.y - _v0.y);
        double z = (_v1.z - _v0.z);
        return ((x * x) + (y * y) + (z * z));
    }

    public static double Dot(Vec3 _v0, Vec3 _v1)
    {
        return ((_v0.x * _v1.x) + (_v0.y * _v1.y) + (_v0.z * _v1.z));
    }

    public static Vec3 ProjectFast(Vec3 _p, Vec3 _a, Vec3 _d) 
    {
        Vec3 ap = _p - _a;
        return (_a + Vec3.Dot(ap, _d) * _d);
    }

    public static Vec3 operator +(Vec3 _v0, Vec3 _v1)
    {
        return (new Vec3(_v0.x + _v1.x, _v0.y + _v1.y, _v0.z + _v1.z));
    }

    public static Vec3 operator -(Vec3 _v0, Vec3 _v1)
    {
        return new Vec3(_v0.x - _v1.x, _v0.y - _v1.y, _v0.z - _v1.z);
    }

    public static Vec3 operator *(double _d1, Vec3 _v0)
    {
        return new Vec3(_d1 * _v0.x, _d1 * _v0.y, _d1 * _v0.z);
    }
}

This is likely to be happening because when you comment-out the returns, the complexity of the method falls below the threshold at which automatic inlining is disabled. 这可能会发生,因为当您注释掉返回时,方法的复杂性会低于禁用自动内联的阈值。

This inlining is not visible in the generated IL - it is done by the JIT compiler. 这种内联在生成的IL中不可见 - 它由JIT编译器完成。

We can test this hypothesis by decorating the method in question with a [MethodImpl(MethodImplOptions.AggressiveInlining)] attribute. 我们可以通过使用[MethodImpl(MethodImplOptions.AggressiveInlining)]属性修饰相关方法来测试此假设。

When I tried this with your code I obtained the following results (release, x64 build): 当我用你的代码尝试这个时,我获得了以下结果(发布,x64 build):

Original code:                      302 ms
First return commented out:           2 ms
Decorated with AggressiveInlining:    2 ms

The time with the first return commented out is the same as what I obtain when decorating the method with AggressiveInlining (leaving the first return enabled). 注释掉第一个返回的时间与使用AggressiveInlining装饰方法时获得的时间相同(启用第一个返回)。

Therefore I conclude that the hypothesis is correct. 因此,我得出结论,假设是正确的。

There are a few interesting things going on here. 这里有一些有趣的事情。 As others have pointed out when you comment out one of the returns, the method RaySphereFast now becomes small enough to inline, and indeed the jit decides to inline it. 正如其他人指出当你注释掉其中一个返回时, RaySphereFast方法现在变得足够小RaySphereFast联,并且jit确实决定内联它。 And this in turn inlines all of the helper methods that it calls. 而这反过来又概括了它调用的所有辅助方法。 As a result the loop body ends up with no calls. 结果,循环体最终没有调用。

Once that happens the jit then "struct promotes" the various Vec3 instances, and since you have initialized all the fields with constants, the jit propagates those constants and folds them at the various operations. 一旦发生这种情况,jit然后“struct promote”各种Vec3实例,并且由于你已经用常量初始化了所有字段,jit传播这些常量并在各种操作中折叠它们。 Because of this the jit realizes that the result of the call will always be true . 因此,jit意识到调用的结果将始终为true

Since every iteration of the loop returns the same value the jit realizes that none of these computations in the loop are actually necessary (since the result is knownn) and deletes them all. 由于循环的每次迭代都返回相同的值,因此jit意识到循环中的这些计算都不是必需的(因为结果是已知的)并将它们全部删除。 So in the "fast" version you are timing an empty loop: 因此,在“快速”版本中,您将计算一个空循环:

G_M52940_IG04:
       BF01000000           mov      edi, 1
       FFC1                 inc      ecx
       81F980969800         cmp      ecx, 0x989680
       7CF1                 jl       SHORT G_M52940_IG04

while in the "slow" version the call doesn't get inlined and none of this optimization kicks in: 而在“慢”版本中,调用没有内联,并且没有一个优化开始:

G_M32193_IG04:
       488D4C2478           lea      rcx, bword ptr [rsp+78H]
       C4617B1109           vmovsd   qword ptr [rcx], xmm9
       C4617B115108         vmovsd   qword ptr [rcx+8], xmm10
       C4617B115910         vmovsd   qword ptr [rcx+16], xmm11
       488D4C2460           lea      rcx, bword ptr [rsp+60H]
       C4617B1121           vmovsd   qword ptr [rcx], xmm12
       C4617B116908         vmovsd   qword ptr [rcx+8], xmm13
       C4617B117110         vmovsd   qword ptr [rcx+16], xmm14
       488D4C2448           lea      rcx, bword ptr [rsp+48H]
       C4E17B1131           vmovsd   qword ptr [rcx], xmm6
       C4E17B117908         vmovsd   qword ptr [rcx+8], xmm7
       C4617B114110         vmovsd   qword ptr [rcx+16], xmm8
       488D4C2478           lea      rcx, bword ptr [rsp+78H]
       488D542460           lea      rdx, bword ptr [rsp+60H]
       4C8D442448           lea      r8, bword ptr [rsp+48H]
       C4E17B101D67010000   vmovsd   xmm3, qword ptr [reloc @RWD64]
       E8D2F8FFFF           call     X:RaySphereFast(struct,struct,struct,double):bool
       8BD8                 mov      ebx, eax
       FFC7                 inc      edi
       81FF80969800         cmp      edi, 0x989680
       7C95                 jl       SHORT G_M32193_IG04

If you are really interested in benchmarking the speed of RaySphereFast make sure to invoke it with different or non-constant arguments on each iteration and also make sure to consume the result of each iteration. 如果您真的对基准测试RaySphereFast的速度感兴趣,请确保在每次迭代时使用不同或非常量参数调用它,并确保使用每次迭代的结果。

Just to add an (obvious) disclaimer to the answer from @Matthew Watson 只是为@Matthew Watson的答案添加一个(明显的)免责声明

The results depend on .NET version, JIT version, etc. FYI I cannot reproduce such a difference, and results come back pretty much equivalent on my environment. 结果取决于.NET版本,JIT版本等.FYI我无法重现这样的差异,结果在我的环境中几乎等同。

I'm using benchmarkDotNet with .NET Core 2.1.0 , see details below 我正在使用带有.NET Core 2.1.0的benchmarkDotNet ,请参阅下面的详细信息

// * Summary *

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-4700MQ CPU 2.40GHz (Max: 1.08GHz) (Haswell), 1 CPU, 8 logical and 4 physical cores
Frequency=2338346 Hz, Resolution=427.6527 ns, Timer=TSC
.NET Core SDK=2.2.100-preview1-009349
  [Host]     : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT


                 Method |     Mean |     Error |    StdDev |
----------------------- |---------:|----------:|----------:|
 RaySphereFast_Original | 40.06 ns | 0.3693 ns | 0.3455 ns |
 RaySphereFast_NoReturn | 40.46 ns | 0.0860 ns | 0.0805 ns |

// * Legends *
  Mean   : Arithmetic mean of all measurements
  Error  : Half of 99.9% confidence interval
  StdDev : Standard deviation of all measurements
  1 ns   : 1 Nanosecond (0.000000001 sec)

// ***** BenchmarkRunner: End *****
Run time: 00:00:34 (34.86 sec), executed benchmarks: 2

// * Artifacts cleanup *

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM