简体   繁体   English

包含固定数组的托管不安全结构上的 C# 'fixed' 语句的开销是多少?

[英]What is the overhead of the C# 'fixed' statement on a managed unsafe struct containing fixed arrays?

I've been trying to determine what the true cost of using the fixed statement within C# for managed unsafe structs that contain fixed arrays.我一直在尝试确定在 C# 中对包含固定数组的托管不安全结构使用固定语句的真正成本。 Please note I am not referring to unmanaged structs.请注意,我指的不是非托管结构。

Specifically, is there a reason to avoid the pattern shown by 'MultipleFixed' class below?具体来说,是否有理由避免下面的“MultipleFixed”类显示的模式? Is the cost of simply fixing the data non zero, near zero (== cost similar to setting & clearing a single flag when entering/exiting the fixed scope), or is it significant enough to avoid when possible?简单地修复数据的成本是非零的,接近于零(== 成本类似于在进入/退出固定范围时设置和清除单个标志),还是在可能的情况下足以避免?

Obviously, these classes are contrived to help explain the question.显然,这些课程旨在帮助解释问题。 This is for a high usage data structure in an XNA game where the read/write performance of this data is critical, so if I need to fix the array and pass it around everywhere I'll do that, but if there is no difference at all I'd prefer to keep the fixed() local to the methods to help with keeping the function signatures slightly more portable to platforms that don't support unsafe code.这适用于XNA游戏中的高使用率数据结构,其中该数据的读/写性能至关重要,所以如果我需要修复数组并将其传递到任何地方,我会这样做,但如果没有区别我宁愿将 fixed() 保留在方法的本地,以帮助保持函数签名对不支持不安全代码的平台更具可移植性。 (Yeah, it's some extra grunt code, but whatever it takes...) (是的,这是一些额外的繁琐代码,但无论如何......)

unsafe struct ByteArray
{
   public fixed byte Data[1024];
}

class MultipleFixed
{
   unsafe void SetValue(ref ByteArray bytes, int index, byte value)
   {
       fixed(byte* data = bytes.Data)
       {
           data[index] = value;
       }
   }

    unsafe bool Validate(ref ByteArray bytes, int index, byte expectedValue)
    {
       fixed(byte* data = bytes.Data)
       {
           return data[index] == expectedValue;
       }
    }

    void Test(ref ByteArray bytes)
    {
        SetValue(ref bytes, 0, 1);
        Validate(ref bytes, 0, 1);
    }
}

class SingleFixed
{
   unsafe void SetValue(byte* data, int index, byte value)
   {
       data[index] = value;
   }

    unsafe bool Validate(byte* data, int index, byte expectedValue)
    {
       return data[index] == expectedValue;
    }

    unsafe void Test(ref ByteArray bytes)
    {
        fixed(byte* data = bytes.Data)
        {
            SetValue(data, 0, 1);
            Validate(data, 0, 1);
        }
    }
}

Also, I looked for similar questions and the closest I found was this , but this question is different in that it is concerned only with pure managed code and the specific costs of using fixed in that context.此外,我寻找了类似的问题,我发现最接近的是this ,但这个问题的不同之处在于它只涉及纯托管代码以及在该上下文中使用固定的具体成本。

That was actually an interesting question that I had myself.这实际上是我自己提出的一个有趣的问题。

The results I managed to obtain suggest slightly different reasons for the performance loss than the 'fixed' statement itself.我设法获得的结果表明性能损失的原因与“固定”语句本身略有不同。

You can see the tests I run and the results below, but there are following observations I draw from those:您可以在下面看到我运行的测试和结果,但我从中得出以下观察结果:

  • the performance of using 'fixed' with pure pointers (x*), without IntPtr, is as good as in managed code;在没有 IntPtr 的情况下,使用带有纯指针 (x*) 的“固定”的性能与托管代码一样好; in Release Mode, it is even way better if fixed is not used too often - that's the most performant way of accessing multiple array values在发布模式下,如果不经常使用固定会更好- 这是访问多个数组值的最高效方式
  • in Debug Mode, using 'fixed' (inside a loop) has a big negative performance impact, but in Release Mode, it works almost as good as normal array access (method FixedAccess);在调试模式下,使用 'fixed'(在循环内)会对性能产生很大的负面影响,但在发布模式下,它的效果几乎与普通数组访问(方法 FixedAccess)一样好;
  • using 'ref' on a reference-type parameter value (float[]) was consistently more or equally performant (both modes)在引用类型参数值 (float[]) 上使用 'ref' 始终具有更高或相同的性能(两种模式)
  • Debug Mode has a significant performance drop vs Release Mode when using IntPtr arithmetic (IntPtrAccess), but for both modes the performance was worse than normal array access使用 IntPtr 算术 (IntPtrAccess) 时,调试模式与发布模式相比性能显着下降,但对于这两种模式,性能都比普通数组访问差
  • if the used offset is not aligned to the array's values' offset, the performance is terrible, regardless of the mode (it actually takes the same amount of time to for both modes).如果使用的偏移量未与数组值的偏移量对齐,则无论模式如何,性能都很糟糕(这两种模式实际上需要相同的时间)。 That holds true for 'float', but it doesn't have any impact for 'int'这适用于'float',但它对'int'没有任何影响

Running the tests multiple times, gives slightly different, but broadly consistent results.多次运行测试,给出的结果略有不同,但大致一致。 Probably I should have run many series of tests and taken the average times, but I had no time for that :)可能我应该进行许多系列的测试并取平均时间,但我没有时间这样做:)

The test class first:先测试类:

class Test {
    public static void NormalAccess (float[] array, int index) {
        array[index] = array[index] + 2;
    }

    public static void NormalRefAccess (ref float[] array, int index) {
        array[index] = array[index] + 2;
    }

    public static void IntPtrAccess (IntPtr arrayPtr, int index) {
        unsafe {
            var array = (float*) IntPtr.Add (arrayPtr, index << 2);
            (*array) = (*array) + 2;
        }
    }

    public static void IntPtrMisalignedAccess (IntPtr arrayPtr, int index) {
        unsafe {
            var array = (float*) IntPtr.Add (arrayPtr, index); // getting bits of a float
            (*array) = (*array) + 2;
        }
    }

    public static void FixedAccess (float[] array, int index) {
        unsafe {
            fixed (float* ptr = &array[index])
                (*ptr) = (*ptr) + 2;
        }
    }

    public unsafe static void PtrAccess (float* ptr) {
        (*ptr) = (*ptr) + 2;
    }

}

And the tests themselves:以及测试本身:

    static int runs = 1000*1000*100;
    public static void Print (string name, Stopwatch sw) {
        Console.WriteLine ("{0}, items/sec = {1:N} \t {2}", sw.Elapsed, (runs / sw.ElapsedMilliseconds) * 1000, name);
    }

    static void Main (string[] args) {
        var buffer = new float[1024*1024*100];
        var len = buffer.Length;

        var sw = new Stopwatch();
        for (int i = 0; i < 1000; i++) {
            Test.FixedAccess (buffer, 55);
            Test.NormalAccess (buffer, 66);
        }

        Console.WriteLine ("Starting {0:N0} items", runs);


        sw.Restart ();
        for (int i = 0; i < runs; i++)
            Test.NormalAccess (buffer, i % len);
        sw.Stop ();

        Print ("Normal access", sw);

        sw.Restart ();
        for (int i = 0; i < runs; i++)
            Test.NormalRefAccess (ref buffer, i % len);
        sw.Stop ();

        Print ("Normal Ref access", sw);

        sw.Restart ();
        unsafe {
            fixed (float* ptr = &buffer[0])
                for (int i = 0; i < runs; i++) {
                    Test.IntPtrAccess ((IntPtr) ptr, i % len);
                }
        }
        sw.Stop ();

        Print ("IntPtr access (fixed outside loop)", sw);

        sw.Restart ();
        unsafe {
            fixed (float* ptr = &buffer[0])
                for (int i = 0; i < runs; i++) {
                    Test.IntPtrMisalignedAccess ((IntPtr) ptr, i % len);
                }
        }
        sw.Stop ();

        Print ("IntPtr Misaligned access (fixed outside loop)", sw);

        sw.Restart ();
        for (int i = 0; i < runs; i++)
            Test.FixedAccess (buffer, i % len);
        sw.Stop ();

        Print ("Fixed access (fixed inside loop)", sw);

        sw.Restart ();
        unsafe {
            fixed (float* ptr = &buffer[0]) {
                for (int i = 0; i < runs; i++) {
                    Test.PtrAccess (ptr + (i % len));
                }
            }
        }
        sw.Stop ();

        Print ("float* access (fixed outside loop)", sw);

        sw.Restart ();
        unsafe {
            for (int i = 0; i < runs; i++) {
                fixed (float* ptr = &buffer[i % len]) {
                    Test.PtrAccess (ptr);
                }
            }
        }
        sw.Stop ();

        Print ("float* access (fixed in loop)", sw);

And finally the results:最后是结果:

Debug mode调试模式

Starting 100,000,000 items
00:00:01.0373583, items/sec = 96,432,000.00      Normal access
00:00:00.8582307, items/sec = 116,550,000.00     Normal Ref access
00:00:01.8822085, items/sec = 53,134,000.00      IntPtr access (fixed outside loop)
00:00:10.5356369, items/sec = 9,492,000.00       IntPtr Misaligned access (fixed outside loop)
00:00:01.6860701, items/sec = 59,311,000.00      Fixed access (fixed inside loop)
00:00:00.7577868, items/sec = 132,100,000.00     float* access (fixed outside loop)
00:00:01.0387792, items/sec = 96,339,000.00      float* access (fixed in loop)

Release mode发布模式

Starting 100,000,000 items
00:00:00.7454832, items/sec = 134,228,000.00     Normal access
00:00:00.6619090, items/sec = 151,285,000.00     Normal Ref access
00:00:00.9859089, items/sec = 101,522,000.00     IntPtr access (fixed outside loop)
00:00:10.1289018, items/sec = 9,873,000.00       IntPtr Misaligned access (fixed outside loop)
00:00:00.7899355, items/sec = 126,742,000.00     Fixed access (fixed inside loop)
00:00:00.5718507, items/sec = 175,131,000.00     float* access (fixed outside loop)
00:00:00.6842333, items/sec = 146,198,000.00     float* access (fixed in loop)

Empirically, the overhead appears to be, in the best case, ~270% on 32 bit JIT and ~200% on 64 bit (and the overhead gets worse the more times you "call" fixed ).根据经验,在最好的情况下,开销似乎在 32 位 JIT 上约为 270%,在 64 位上约为 200%(并且开销会随着您“调用” fixed的次数越多而变得更糟)。 So I'd try to minimise your fixed blocks if performance is really critical.所以如果性能真的很关键,我会尽量减少你的fixed块。

Sorry, I'm not familiar enough with fixed / unsafe code to know why that's the case抱歉,我对固定/不安全代码不够熟悉,不知道为什么会这样


Details细节

I also added some TestMore methods which call your two test methods 10 times instead of 2 to give a more real world scenario of multiple methods being called on your fixed struct.我还添加了一些TestMore方法,它们调用您的两个测试方法 10 次而不是 2 次,以提供在您的fixed结构上调用多个方法的更真实的世界场景。

The code I used:我使用的代码:

class Program
{
    static void Main(string[] args)
    {
        var someData = new ByteArray();
        int iterations = 1000000000;
        var multiple = new MultipleFixed();
        var single = new SingleFixed();

        // Warmup.
        for (int i = 0; i < 100; i++)
        {
            multiple.Test(ref someData);
            single.Test(ref someData);
            multiple.TestMore(ref someData);
            single.TestMore(ref someData);
        }

        // Environment.
        if (Debugger.IsAttached)
            Console.WriteLine("Debugger is attached!!!!!!!!!! This run is invalid!");
        Console.WriteLine("CLR Version: " + Environment.Version);
        Console.WriteLine("Pointer size: {0} bytes", IntPtr.Size);
        Console.WriteLine("Iterations: " + iterations);

        Console.Write("Starting run for Single... ");
        var sw = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            single.Test(ref someData);
        }
        sw.Stop();
        Console.WriteLine("Completed in {0:N3}ms - {1:N2}/sec", sw.Elapsed.TotalMilliseconds, iterations / sw.Elapsed.TotalSeconds);

        Console.Write("Starting run for More Single... ");
        sw = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            single.Test(ref someData);
        }
        sw.Stop();
        Console.WriteLine("Completed in {0:N3}ms - {1:N2}/sec", sw.Elapsed.TotalMilliseconds, iterations / sw.Elapsed.TotalSeconds);


        Console.Write("Starting run for Multiple... ");
        sw = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            multiple.Test(ref someData);
        }
        sw.Stop();
        Console.WriteLine("Completed in {0:N3}ms - {1:N2}/sec", sw.Elapsed.TotalMilliseconds, iterations / sw.Elapsed.TotalSeconds);

        Console.Write("Starting run for More Multiple... ");
        sw = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            multiple.TestMore(ref someData);
        }
        sw.Stop();
        Console.WriteLine("Completed in {0:N3}ms - {1:N2}/sec", sw.Elapsed.TotalMilliseconds, iterations / sw.Elapsed.TotalSeconds);


        Console.ReadLine();
    }
}

unsafe struct ByteArray
{
    public fixed byte Data[1024];
}

class MultipleFixed
{
    unsafe void SetValue(ref ByteArray bytes, int index, byte value)
    {
        fixed (byte* data = bytes.Data)
        {
            data[index] = value;
        }
    }

    unsafe bool Validate(ref ByteArray bytes, int index, byte expectedValue)
    {
        fixed (byte* data = bytes.Data)
        {
            return data[index] == expectedValue;
        }
    }

    public void Test(ref ByteArray bytes)
    {
        SetValue(ref bytes, 0, 1);
        Validate(ref bytes, 0, 1);
    }
    public void TestMore(ref ByteArray bytes)
    {
        SetValue(ref bytes, 0, 1);
        Validate(ref bytes, 0, 1);
        SetValue(ref bytes, 0, 2);
        Validate(ref bytes, 0, 2);
        SetValue(ref bytes, 0, 3);
        Validate(ref bytes, 0, 3);
        SetValue(ref bytes, 0, 4);
        Validate(ref bytes, 0, 4);
        SetValue(ref bytes, 0, 5);
        Validate(ref bytes, 0, 5);
    }
}

class SingleFixed
{
    unsafe void SetValue(byte* data, int index, byte value)
    {
        data[index] = value;
    }

    unsafe bool Validate(byte* data, int index, byte expectedValue)
    {
        return data[index] == expectedValue;
    }

    public unsafe void Test(ref ByteArray bytes)
    {
        fixed (byte* data = bytes.Data)
        {
            SetValue(data, 0, 1);
            Validate(data, 0, 1);
        }
    }
    public unsafe void TestMore(ref ByteArray bytes)
    {
        fixed (byte* data = bytes.Data)
        {
            SetValue(data, 0, 1);
            Validate(data, 0, 1);
            SetValue(data, 0, 2);
            Validate(data, 0, 2);
            SetValue(data, 0, 3);
            Validate(data, 0, 3);
            SetValue(data, 0, 4);
            Validate(data, 0, 4);
            SetValue(data, 0, 5);
            Validate(data, 0, 5);
        }
    }
}

And the results in .NET 4.0, 32 bit JIT: .NET 4.0 32 位 JIT 中的结果:

CLR Version: 4.0.30319.239
Pointer size: 4 bytes
Iterations: 1000000000
Starting run for Single... Completed in 2,092.350ms - 477,931,580.94/sec
Starting run for More Single... Completed in 2,236.767ms - 447,073,934.63/sec
Starting run for Multiple... Completed in 5,775.922ms - 173,132,528.92/sec
Starting run for More Multiple... Completed in 26,637.862ms - 37,540,550.36/sec

And in .NET 4.0, 64 bit JIT:在 .NET 4.0、64 位 JIT 中:

CLR Version: 4.0.30319.239
Pointer size: 8 bytes
Iterations: 1000000000
Starting run for Single... Completed in 2,907.946ms - 343,885,316.72/sec
Starting run for More Single... Completed in 2,904.903ms - 344,245,585.63/sec
Starting run for Multiple... Completed in 5,754.893ms - 173,765,185.93/sec
Starting run for More Multiple... Completed in 18,679.593ms - 53,534,358.13/sec

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM