简体   繁体   English

.NET应用程序因GC线程死锁而挂起

[英].NET application hangs with GC thread deadlock

We have a problem with our application that is using a mixture of managed (C#) and unmanaged (C++) code. 我们的应用程序存在一个问题,该问题混合使用了托管(C#)和非托管(C ++)代码。 Basically we have a exe that invokes a bunch of assemblies and one of these assemblies is a MC++ wrapper of our C++ library. 基本上,我们有一个exe来调用一堆程序集,而其中一个程序集是C ++库的MC ++包装器。 The application is a console app. 该应用程序是一个控制台应用程序。 Most of the time it work fine but occasionally it hangs without any errors or exceptions. 大多数情况下,它可以正常工作,但偶尔挂起,没有任何错误或异常。

Using memory dumps and symbols we've been able to do some diagnosis in WinDbg but I'm not really sure what we are seeing is a deadlock or not. 使用内存转储和符号,我们已经能够在WinDbg中进行一些诊断,但是我不确定自己看到的是死锁还是死锁。 I've searched for the CLR method names that come up in the stack but haven't been able to find cases where one thread is trying to allocate memory and gets deadlocked with GC. 我已经搜索了堆栈中出现的CLR方法名称,但是还没有找到一个线程试图分配内存并被GC死锁的情况。

So far I've tried WinDbg with sos, sosex, psscor4 extensions. 到目前为止,我已经尝试过使用sos,sosex,psscor4扩展名的WinDbg。 Intrestingly sosex has a command to check for deadlocks (!dlk) but it reports no deadlocks. 有趣的是,sosex具有检查死锁的命令(!dlk),但它没有报告死锁。

It's hard to post the code because it's a large and complex app. 发布代码非常困难,因为它是一个大型而复杂的应用程序。 There is mixture of .NET 3.5 and 4.0 assemblies. .NET 3.5和4.0程序集混合在一起。 There are threads in both managed and unmanaged code. 托管和非托管代码中都有线程。

I would appricate if someone could look at the stack traces and confirm that this is a possible deadlock with GC thread. 如果有人可以查看堆栈跟踪并确认这是GC线程可能造成的死锁,我会提出建议。 Or even better if you can suggest some other way of debugging deadlocks/hangs in .NET apps that use C# and MC++. 甚至更好,如果您可以建议使用其他方法调试使用C#和MC ++的.NET应用程序中的死锁/挂起。

Here's what I have so far: 这是我到目前为止的内容:

List of threads when the app hangs: (!threads) 应用程序挂起时的线程列表:(!threads)

ThreadCount:      8
UnstartedThread:  0
BackgroundThread: 5
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
                                           PreEmptive                                                   Lock
       ID  OSID        ThreadOBJ     State GC       GC Alloc Context                  Domain           Count APT Exception
   0    1   de0 00000000008069f0      a020 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 MTA
   2    2  2130 000000000080bd30      b220 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 MTA (Finalizer)
   4    3  14fc 000000001d182880   200b020 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 MTA
   5    4  20d0 000000001d18b400      b220 Enabled  0000000000000000:0000000000000000 00000000007fa280     2 MTA (GC)
   6    5  18a8 000000001d19f6a0      b020 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 MTA
   7    6  18a0 000000001d1c6f10       220 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 Ukn
   8    7  12f4 000000001d1c1ee0       220 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 Ukn
  10    8  2170 000000001d1c2ad0       220 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 Ukn

       OSID     Special thread type
    1   2570    DbgHelper 
    2   2130    Finalizer 
    5   20d0    SuspendEE 
   12   1890    GC

This is what the stack of the GC thread looks like: 这是GC线程的堆栈如下所示:

OS Thread Id: 0x1890 (12)
Child-SP         RetAddr          Call Site
0000000023e9f898 000000007799e4e8 ntdll!ZwWaitForSingleObject+0xa
0000000023e9f8a0 000000007799e3db ntdll!RtlpWaitOnCriticalSection+0xe8
0000000023e9f950 000007fef95d603e ntdll!RtlEnterCriticalSection+0xd1
0000000023e9f980 000007fef947bc41 clr!UnsafeEEEnterCriticalSection+0x1f
0000000023e9f9b0 000007fef947613a clr!CrstBase::Enter+0x1a1
0000000023e9f9f0 000007fef95da3a2 clr!ThreadStore::LockThreadStore+0x9a
0000000023e9fa20 000007fef9679675 clr!WKS::GCHeap::SuspendEE+0x82
0000000023e9fb20 000007fef9677eb2 clr!WKS::gc_heap::bgc_suspend_EE+0x25
0000000023e9fb50 000007fef98455b0 clr!WKS::gc_heap::background_mark_phase+0x236
0000000023e9fbb0 000007fef9677b76 clr! ?? ::FNODOBFM::`string'+0x9f85d
0000000023e9fc00 00000000773d652d clr!WKS::gc_heap::gc_thread_function+0xd3
0000000023e9fc30 000000007797c521 KERNEL32!BaseThreadInitThunk+0xd
0000000023e9fc60 0000000000000000 ntdll!RtlUserThreadStart+0x1d

To me it looks like the GC thread is waiting for the Critical Section. 在我看来,GC线程正在等待关键部分。 We were able to find the Critical Section address and then find the owner thread for it (!critsec). 我们能够找到关键部分的地址,然后找到它的所有者线程(!critsec)。 The stack for the owner thread looked something like below. 所有者线程的堆栈如下所示。 I've trimmed it to keep it short for this post. 我已对其进行了修剪,以使其简短。 (!dumpstack) (!dumpstack)

OS Thread Id: 0x20d0 (5)
Child-SP         RetAddr          Call Site
000000001fc5dd38 000007fefe0510dc ntdll!ZwWaitForSingleObject+0xa
000000001fc5dd40 000007fef9478817 KERNELBASE!WaitForSingleObjectEx+0x79
000000001fc5dde0 000007fef94787c0 clr!CLREvent::WaitEx+0x170
000000001fc5de20 000007fef947866b clr!CLREvent::WaitEx+0xf8
000000001fc5de80 000007fef967a15b clr!CLREvent::WaitEx+0x5e
000000001fc5df20 000007fef967a001 clr!WKS::gc_heap::user_thread_wait+0x49
000000001fc5df50 000007fef95dbb4e clr! ?? ::FNODOBFM::`string'+0x9fcc4
000000001fc5e030 000007fef95da22e clr!WKS::GCHeap::GarbageCollectGeneration+0x14e
000000001fc5e080 000007fef95d9e4e clr!WKS::gc_heap::try_allocate_more_space+0x25f
000000001fc5e150 000007fef95d9fc8 clr!WKS::GCHeap::Alloc+0x7e
000000001fc5e180 000007fef947407c clr!AllocateArrayEx+0xa6b
000000001fc5e2f0 000007fef8555b75 clr!JIT_NewArr1+0x45c
000000001fc5e4c0 000007fef8561103 mscorlib_ni!System.Reflection.CustomAttributeData.GetCustomAttributeRecords(System.Reflection.RuntimeModule, Int32)+0x115
000000001fc5e590 000007fef855db55 mscorlib_ni!System.Reflection.CustomAttribute.IsCustomAttributeDefined(System.Reflection.RuntimeModule, Int32, System.RuntimeType, Boolean)+0x103
000000001fc5e720 000007fef856c8ac mscorlib_ni!System.Reflection.CustomAttribute.IsDefined(System.RuntimeType, System.RuntimeType, Boolean)+0x75
000000001fc5e770 000007fef857fe46 mscorlib_ni!System.Enum.InternalFormat(System.RuntimeType, System.Object)+0x2c
000000001fc5e7b0 000007fef8554f3b mscorlib_ni!System.Text.StringBuilder.AppendFormat(System.IFormatProvider, System.String, System.Object[])+0x2e6
000000001fc5e850 000007ff03c640fc mscorlib_ni!System.String.Format(System.IFormatProvider, System.String, System.Object[])+0x7b
000000001fc5e8b0 000007ff03c638a6 MyLibrary1!NumberCache.NumberEntry.ToString()+0x26c

This line in the second callstack looks suspicious: 第二个调用堆栈中的这一行看起来可疑:

000000001fc5df50 000007fef95dbb4e clr! ?? ::FNODOBFM::`string'+0x9fcc4 

Look how large the offset address is, and I don't see any module name -- are you missing some symbols? 看看偏移地址有多大,我看不到任何模块名称,您是否缺少一些符号?

Maybe there is a finalizer in that library that is causing a problem. 该库中可能有一个终结器导致问题。

不是一个大专家,但只是好奇并想知道finalizer线程是否在锁定某些资源时崩溃了(我的意思是finalizer线程中未处理的异常),而GC线程正在尝试获取该线程?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM