Why are reference types "slower" when used as generic type arguments in .NET 5?

Question

Today I ran into this issue: When using reference types as type arguments for a outer generic type, other methods in nested types are slower by a factor ~10. It does not matter which types I use - all reference types seem to "slow" the code down. (Sorry for the title, maybe somebody can find a more suitable one.)

Tested with .NET 5/Release builds.

What am I missing?

EDIT 2 :

I'll try to explain the problem a little bit further and cleanup the code. If you still want to see the old version, here is a copy:

https://gist.github.com/sneusse/1b5ee408dd3fdd74fcf9d369e144b35f

The new code illustrates the same issue with hopefully less distraction.

The class WthGeneric<T> is instantiated twice
The first instance uses any reference type as the type argument (here: object )
The second instance uses any value type as the type argument (here: long )
As both are instances of the same class both have the same method WhatIsHappeningHere
Neither of the instances uses the generic argument in any way.

This leads to the question: Why is the runtime of the same instance method 10x higher than the other one?

Output:

System.Object: 516,8448ms
System.Int64: 50,6958ms

Code:

using System;
using System.Diagnostics;
using System.Linq;

namespace Perf
{
    public interface IWthGeneric
    {
        int WhatIsHappeningHere();
    }
    
    // This is a generic class. Note that the generic
    // type argument 'T' is _NOT_ used at all!
    public class WthGeneric<T> : IWthGeneric
    {
        // This is part of the issue.
        // If this field is not accessed or moved *outside*
        // of the generic 'WthGeneric' class, the code is fast again
        // ** also with reference types **
        public static int StaticVar = 12;

        static class NestedClass
        {
            public static int Add(int value) => StaticVar + value;
        }

        public int WhatIsHappeningHere()
        {
            var x = 0;
            for (int i = 0; i < 100000000; i++)
            {
                x += NestedClass.Add(i);
            }
            return x;
        }
    }
    
    public class RunMe
    {
        public static void Run()
        {
            // The interface is used so nothing could ever get inlined.
            var wthObject  = (IWthGeneric) new WthGeneric<object>();
            var wthValueType = (IWthGeneric) new WthGeneric<long>();

            void Test(IWthGeneric instance)
            {
                var sw = Stopwatch.StartNew();
                var x  = instance.WhatIsHappeningHere();
                Console.WriteLine(
                    $"{instance.GetType().GetGenericArguments().First()}: " +
                    $"{sw.Elapsed.TotalMilliseconds}ms");
            }

            for (int i = 0; i < 10; i++)
            {
                Test(wthObject);
                Test(wthValueType);
            }
        }
    }
}

Answer 1

I'm ready to say this is a jitter's fault. Perhaps "fault" is too strong word. The jitter does not optimize this case.

Using SharpLap to look at the JIT asm of this code:

using SharpLab.Runtime;

[JitGeneric(typeof(int))]
public class A<T>
{
    public static int X;

    public static class B
    {
        public static int C() => X;
    }
}

Note : The attribute JitGeneric(typeof(int)) is telling SharpLab to JIT this code with the generic argument int . Without a generic argument, it is not possible to JIT a generic type.

We get this:

; Core CLR v5.0.321.7212 on x86

A`1[[System.Int32, System.Private.CoreLib]]..ctor()
    L0000: ret

A`1+B[[System.Int32, System.Private.CoreLib]].C()
    L0000: mov ecx, 0x2051c600
    L0005: xor edx, edx
    L0007: call 0x5e646b70
    L000c: mov eax, [eax+4]
    L000f: ret

Try it online .

Meanwhile, for this code:

using SharpLab.Runtime;

[JitGeneric(typeof(object))]
public class A<T>
{
    public static int X;

    public static class B
    {
        public static int C() => X;
    }
}

Note : Yes, this is the same class, except now I'm telling SharpLap to JIT it for the generic argument object .

We get this:

; Core CLR v5.0.321.7212 on x86

A`1[[System.__Canon, System.Private.CoreLib]]..ctor()
    L0000: ret

A`1+B[[System.__Canon, System.Private.CoreLib]].C()
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push eax
    L0004: mov [ebp-4], ecx
    L0007: mov edx, [ecx+0x20]
    L000a: mov edx, [edx]
    L000c: mov edx, [edx+8]
    L000f: test edx, edx
    L0011: je short L0015
    L0013: jmp short L0021
    L0015: mov edx, 0x2046cec4
    L001a: call 0x5e4e4090
    L001f: mov edx, eax
    L0021: mov ecx, edx
    L0023: call 0x5e4fa760
    L0028: mov eax, [eax+4]
    L002b: mov esp, ebp
    L002d: pop ebp
    L002e: ret

Try it online .

We observe that for the reference type generic argument, we get a much longer code. Is that code necessary? Well, we are accessing a public static field of a generic class. Let us see how that looks if the other class is not nested:

using SharpLab.Runtime;

public static class Bint
{
    public static int C() => A<int>.X;
}

public static class Bobject
{
    public static int C() => A<object>.X;
}

[JitGeneric(typeof(object))]
public class A<T>
{
    public static int X;
}

We get this code:

; Core CLR v5.0.321.7212 on x86

Bint.C()
    L0000: mov ecx, 0x209fc618
    L0005: xor edx, edx
    L0007: call 0x5e646b70
    L000c: mov eax, [eax+4]
    L000f: ret

Bobject.C()
    L0000: mov ecx, 0x209fc618
    L0005: mov edx, 1
    L000a: call 0x5e646b70
    L000f: mov eax, [eax+4]
    L0012: ret

A`1[[System.__Canon, System.Private.CoreLib]]..ctor()
    L0000: ret

Try it online .

Therefore, no, we don't need the long version of the code. We must conclude that the jitter is not optimizing this case appropriately.

Answer 2

Not 100% sure, but I think I know why the JIT is not optimizing this:

As I understand it, every generic type generally only has one version of the JITted code for reference types, named System.__Canon , and the type parameter is passed in as an actual typeref parameter. Whereas for valuetypes each one is generated separately.

This is because a reference type always looks the same to the JIT: a pointer to an object which has its first field as a pointer to its typeref and methodtable. But valuetypes are all different, so each must be custom-built.

You say you don't use the type parameter, but actually you do. When you access a static field of a generic type, each instantiated generic type needs a separate copy of the static field.

So the code must now do a pointer lookup to the type parameter's typeref to get the static field's value.

But in the valuetype version, the typeref is statically known, therefore it's a straight memory access every time.

Why are reference types "slower" when used as generic type arguments in .NET 5?

Question

2 answers

solution1
3 2021-04-12 20:13:47

solution2
3 ACCPTED 2021-04-12 20:31:24

Why are reference types "slower" when used as generic type arguments in .NET 5?

Question

2 answers

solution1 3 2021-04-12 20:13:47

solution2 3 ACCPTED 2021-04-12 20:31:24

solution1
3 2021-04-12 20:13:47

solution2
3 ACCPTED 2021-04-12 20:31:24