Implementation and performance of “is T x” for value types

Question

I have situation similar to this:

interface IStorage
{
    bool TryGetValue<T>(out T result) where T : struct;
}

class Storage<T> : IStorage where T : struct
{
    readonly T value;

    public Storage(T val)
    {
        value = val;
    }

    public bool TryGetValue<T2>(out T2 result) where T2 : struct
    {
        if(value is T2 val)
        {
            result = val;
            return true;
        }
        result = default;
        return false;
    }
}

In the program, instances implementing IStorage are passed around, and they can be queried for a value of a particular type. I could use something like IStorage<T> and type test it to check if it supports the type, but it would make the code messier since there are other types implementing IStorage that decide whether they support the type or not at runtime.

Now I wonder about value is T2 val . Its purpose is to check whether T and T2 are the same types (and hence compatible), since both are value types. Specialized for T , TryGetValue should return true , and it should return false for all other types.

I am not sure whether this is the best implementation of the check. There are basically two general steps to solve it:

Determine if T and T2 are the same.
Reinterpret value as T2 and return it.

There are two other solutions to this problem that could be considered: casting the value to object and checking and unboxing it, or __refvalue(__makeref(value), T2) but that is probably not guaranteed to work on all platforms.

Now value is T2 val looks quite nicely, conveys the meaning well, but I am also wondering about the performance implications and possible optimizations. When I disassemble the method, it turns to this:

  .locals init ([0] !!T2 val,
           [1] !T V_1)
  IL_0000:  ldarg.0
  IL_0001:  ldfld      !0 value
  IL_0006:  dup
  IL_0007:  stloc.1
  IL_0008:  box        !T
  IL_000d:  isinst     !!T2
  IL_0012:  brfalse.s  IL_0029
  IL_0014:  ldloc.1
  IL_0015:  box        !T
  IL_001a:  unbox.any  !!T2
  IL_001f:  stloc.0
  IL_0020:  ldarg.1
  IL_0021:  ldloc.0
  IL_0022:  stobj      !!T2
  IL_0027:  ldc.i4.1
  IL_0028:  ret
  IL_0029:  ldarg.1
  IL_002a:  initobj    !!T2
  IL_0030:  ldc.i4.0
  IL_0031:  ret

So it turns out the expression boxes the value not only once, but twice, the first time for isinst and the second time for unbox.any , so not only does it hide boxing (which is generally considered quite expensive), but it does it twice.

I have two questions: Is there a better way to achieve this kind of specialization? Is it possible this CIL code, while looking quite inefficient, is optimized later at runtime by JIT?

In this particular case, I'd expect the runtime to infer that the only instantiation where T is T2 should return true, and it should ignore all other code, including the check. Could this be the case?

Answer 1

"Is it possible this CIL code, while looking quite inefficient, is optimized later at runtime by JIT?" - No, it seems as though the result JITted code is also bloated, but more tests are needed to verify. My small 4.8 framework compile if (value is T2 val) to

00007FFDEF110E3D  mov         rdx,qword ptr [rbp+90h]  
00007FFDEF110E44  add         rdx,8  
00007FFDEF110E48  vmovdqu     xmm0,xmmword ptr [rdx]  
00007FFDEF110E4D  vmovdqu     xmmword ptr [rbp+40h],xmm0  
00007FFDEF110E53  lea         rdx,[rbp+40h]  
00007FFDEF110E57  mov         rcx,7FFDEF006C68h  
00007FFDEF110E61  call        00007FFE4E642570  
00007FFDEF110E66  mov         qword ptr [rbp+30h],rax  
00007FFDEF110E6A  mov         rdx,qword ptr [rbp+30h]  
00007FFDEF110E6E  mov         rcx,7FFDEF006C68h  
00007FFDEF110E78  call        00007FFE4E643D00  
00007FFDEF110E7D  test        rax,rax  
00007FFDEF110E80  je          00007FFDEF110ED7  
00007FFDEF110E82  lea         rdx,[rbp+40h]  
00007FFDEF110E86  mov         rcx,7FFDEF006C68h  
00007FFDEF110E90  call        00007FFE4E642570  
00007FFDEF110E95  mov         qword ptr [rbp+28h],rax  
00007FFDEF110E99  mov         rdx,qword ptr [rbp+28h]  
00007FFDEF110E9D  mov         rcx,7FFDEF006C68h  
00007FFDEF110EA7  call        00007FFE4E643D00  
00007FFDEF110EAC  mov         qword ptr [rbp+20h],rax  
00007FFDEF110EB0  mov         rdx,qword ptr [rbp+20h]  
00007FFDEF110EB4  mov         rcx,7FFDEF006C68h  
00007FFDEF110EBE  call        00007FFE4E6BC030  
00007FFDEF110EC3  vmovdqu     xmm0,xmmword ptr [rax]  
00007FFDEF110EC8  vmovdqu     xmmword ptr [rbp+58h],xmm0  
00007FFDEF110ECE  mov         dword ptr [rbp+38h],1  
00007FFDEF110ED5  jmp         00007FFDEF110EDC  
00007FFDEF110ED7  xor         eax,eax  
00007FFDEF110ED9  mov         dword ptr [rbp+38h],eax  
00007FFDEF110EDC  mov         eax,dword ptr [rbp+38h]  
00007FFDEF110EDF  movzx       eax,al  
00007FFDEF110EE2  mov         dword ptr [rbp+54h],eax  
00007FFDEF110EE5  cmp         dword ptr [rbp+54h],0  
00007FFDEF110EE9  je          00007FFDEF110F08

This is one of the constrains of using a ValueType - such operations become a mess. If this is a common use case for you, maybe this shouldn't be a struct?

Implementation and performance of “is T x” for value types

Question

1 answers

solution1
1 2020-05-21 07:34:00

Implementation and performance of “is T x” for value types

Question

1 answers

solution1 1 2020-05-21 07:34:00

solution1
1 2020-05-21 07:34:00