Benchmarking the following surprisingly gives better results for managed arrays( 10% faster, consistently ). I'm testing in Unity, so maybe it relates to Mono?
unsafe void Bench()
{
//Locals
int i, j;
const int bufSize = 1024 * 1024;
const int numIterations = 1000;
const float gain = 1.6745f;
float[] managedBuffer;
IntPtr ptr;
float * unmanagedBuffer;
Stopwatch stopwatch;
// Allocations
managedBuffer = new float[ bufSize ];
for( i = 0; i < bufSize; i++ )
{
managedBuffer[ i ] = UnityEngine.Random.value;
}
ptr = Marshal.AllocHGlobal( bufSize * sizeof( float ) );
unmanagedBuffer = ( float * )ptr.ToPointer();
Marshal.Copy( managedBuffer, 0, ptr, bufSize );
stopwatch = new Stopwatch();
stopwatch.Start();
// Unmanaged array iterations
for( i = 0; i < numIterations; i++ )
{
for( j = 0; j < bufSize; j++ )
{
unmanagedBuffer[ j ] *= gain;
}
}
UnityEngine.Debug.Log( stopwatch.ElapsedMilliseconds );
stopwatch.Reset();
stopwatch.Start();
// Managed array iterations
for( i = 0; i < numIterations; i++ )
{
for( j = 0; j < bufSize; j++ )
{
managedBuffer[ j ] *= gain;
}
}
UnityEngine.Debug.Log( stopwatch.ElapsedMilliseconds );
Marshal.FreeHGlobal( ptr );
}
I'm experimenting with unsafe code for an audio application which is quite performance critical. I'm hoping to increase performance and decrease / eliminate garbage collection.
Any insights appreciated!
Not really an answer, but needs more space than comment.
If you use ILSpy to observe IL code, then difference is (release, default settings, my PC: Windows 7 64):
// unmanaged
IL_005a: ldloc.s unmanagedBuffer
IL_005c: ldloc.1
IL_005d: conv.i
IL_005e: ldc.i4.4
IL_005f: mul
IL_0060: add
IL_0061: dup
IL_0062: ldind.r4
IL_0063: ldc.r4 1.6745
IL_0068: mul
IL_0069: stind.r4
IL_006a: ldloc.1
IL_006b: ldc.i4.1
IL_006c: add
IL_006d: stloc.1
// managed
IL_00a4: ldloc.2
IL_00a5: ldloc.1
IL_00a6: ldelema [mscorlib]System.Single
IL_00ab: dup
IL_00ac: ldobj [mscorlib]System.Single
IL_00b1: ldc.r4 1.6745
IL_00b6: mul
IL_00b7: stobj [mscorlib]System.Single
IL_00bc: ldloc.1
IL_00bd: ldc.i4.1
IL_00be: add
IL_00bf: stloc.1
I don't know how much machine code correspond to each IL instruction, but it might be optimization problem (see how much work required to calculate index in case of unmanaged buffer).
I noticed non-linear correlation between number of iterations and time:
First column is numIterations
, second is unamanged time (ms), last one - managed time (ms).
Until 170
it's linear and then something start happening (disregards of increment, on screen is 10
, I tried 5
it's also good until 170
). This bugs me and I'd really want to get real answer here.
Not an answer, but I need space. Using C# nad VS13 I saw different assembly for multiplication.
UNMANAGED
00007FFC013555D9 movsxd rcx,dword ptr [rbp+0D8h]
00007FFC013555E0 mov rax,qword ptr [rbp+0C0h]
00007FFC013555E7 lea rax,[rax+rcx*4]
00007FFC013555EB mov qword ptr [rbp+50h],rax
00007FFC013555EF mov rax,qword ptr [rbp+50h]
00007FFC013555F3 movss xmm0,dword ptr [7FFC013558A0h]
00007FFC013555FB mulss xmm0,dword ptr [rax]
00007FFC013555FF mov rax,qword ptr [rbp+50h]
00007FFC01355603 movss dword ptr [rax],xmm0
MANAGED
00007FFC01355722 movsxd rcx,dword ptr [rbp+0D8h]
00007FFC01355729 mov rax,qword ptr [rbp+0D0h]
00007FFC01355730 mov rax,qword ptr [rax+8]
00007FFC01355734 mov qword ptr [rbp+78h],rcx
00007FFC01355738 cmp qword ptr [rbp+78h],rax
00007FFC0135573C jae 00007FFC01355748
00007FFC0135573E mov rax,qword ptr [rbp+78h]
00007FFC01355742 mov qword ptr [rbp+78h],rax
00007FFC01355746 jmp 00007FFC0135574D
00007FFC01355748 call 00007FFC60E86590
00007FFC0135574D mov rcx,qword ptr [rbp+0D0h]
00007FFC01355754 mov rax,qword ptr [rbp+78h]
00007FFC01355758 lea rax,[rcx+rax*4+10h]
00007FFC0135575D mov qword ptr [rbp+80h],rax
00007FFC01355764 mov rax,qword ptr [rbp+80h]
00007FFC0135576B movss xmm0,dword ptr [7FFC013558A0h]
00007FFC01355773 mulss xmm0,dword ptr [rax]
00007FFC01355777 mov rax,qword ptr [rbp+80h]
00007FFC0135577E movss dword ptr [rax],xmm0
Obviously the big the code the slow the execution...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.