简体   繁体   中英

C# stackalloc slower than regular variables?

I have 2 functions implementing uint128 multiplication in 2 different ways: one is using variables, the other using stackalloc "arrays".

Variable Version

public static UInt128 operator *(UInt128 i, UInt128 j) {

 ulong I0 = i._uint0; ulong I1 = i._uint1; ulong I2 = i._uint2; ulong I3 = i._uint3;
 ulong J0 = j._uint0; ulong J1 = j._uint1; ulong J2 = j._uint2; ulong J3 = j._uint3;
 ulong R0 = 0; ulong R1 = 0; ulong R2 = 0; ulong R3 = 0;

 if (I0 != 0) {
   R0 += I0 * J0;
   R1 += I0 * J1;
   R2 += I0 * J2;
   R3 += I0 * J3;
 }
 if (I1 != 0) {
   R1 += I1 * J0;
   R2 += I1 * J1;
   R3 += I1 * J2;
 }
 if (I2 != 0) {
   R2 += I2 * J0;
   R3 += I2 * J1;
 }
 R3 += I3 * J0;

 R1 += R0 >> 32; R0 &= uint.MaxValue;
 R2 += R1 >> 32; R1 &= uint.MaxValue;
 R3 += R2 >> 32; R2 &= uint.MaxValue;
 R3 &= uint.MaxValue;

 return new UInt128((uint)R3, (uint)R2, (uint)R1, (uint)R0);
}

Stackalloc Version

The [0 + 1] , [1 + 1] , etc. are left for clarity only. They will be optimized by C# compiler into constants anyways.

public unsafe static UInt128 operator *(UInt128 i, UInt128 j) {

  var I = stackalloc ulong[4];
  var J = stackalloc ulong[4];
  var R = stackalloc ulong[4];

  I[0] = i._uint0; I[1] = i._uint1; I[2] = i._uint2; I[3] = i._uint3;
  J[0] = j._uint0; J[1] = j._uint1; J[2] = j._uint2; J[3] = j._uint3;


  if (I[0] != 0) {
    R[0] += I[0] * J[0];
    R[0 + 1] += I[0] * J[1];
    R[0 + 2] += I[0] * J[2];
    R[0 + 3] += I[0] * J[3];
  }
  if (I[1] != 0) {
    R[1] += I[1] * J[0];
    R[1 + 1] += I[1] * J[1];
    R[1 + 2] += I[1] * J[2];
  }
  if (I[2] != 0) {
    R[2] += I[2] * J[0];
    R[2 + 1] += I[2] * J[1];
  }
  R[3] += I[3] * J[0];


  R[1] += R[0] >> 32; R[0] &= uint.MaxValue;
  R[2] += R[1] >> 32; R[1] &= uint.MaxValue;
  R[3] += R[2] >> 32; R[2] &= uint.MaxValue;
  R[3] &= uint.MaxValue;

  return new UInt128((uint)R[3], (uint)R[2], (uint)R[1], (uint)R[0]);
}

For some reason the "variable" version seems to be ~20% faster than the "stackalloc" version on both x86 and x64 (with optimizations) using C# 7.2 compiler running on .NET 4.6.1. Haven't checked the performance on newer/older frameworks but suspect it will be similar, so my question is not specific to 4.6.1 only, as it seems to be generally the case that stackalloc is slower.

Is there any reason that the stackalloc version is slower considering that both version allocate exactly the same amount of memory ( 12 * sizeof(ulong) ) and perform exactly the same operations in the same order? I would really prefer to work with arrays via stackalloc instead of variables.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM