简体   繁体   English

WCF服务中无法解释的线程创建和句柄计数增加

[英]Unexplained thread creation and handle count increase in a WCF service

We have several WCF services hosted on IIS on a Server 2008 R2 Amazon EC2 instance with 32 cores. 我们在具有32个核心的Server 2008 R2 Amazon EC2实例上的IIS上托管了多个WCF服务。 We are using .NET Framework Version 4.5.2. 我们正在使用.NET Framework 4.5.2版。 The issue at hand is unexplained increases in handle counts - some of our services accrue hundreds of thousands of open handles after being active for greater than one day (when I force garbage collection using a 3rd party tool, the handle counts dropped to around 2k). 当前的问题是无法解释的句柄数量增加-我们的某些服务在活动超过一天后会累积成千上万的打开句柄(当我使用第3方工具强制进行垃圾收集时,句柄计数降至2k左右) 。 In investigating this, I created a simple service with no functionality, and started it under IIS. 为了对此进行调查,我创建了一个没有功能的简单服务,并在IIS下启动了它。 No client requests were being made to this service. 没有客户端请求对此服务。 In one hour, there were 20k+ handles open under under the service's process. 在一个小时内,根据服务流程,有2万多个把手打开。 Looking at the service's process with procmon, I could see bursts of 20+ thread exits and then thread creates, every 40 seconds or so. 使用procmon查看服务的过程,我发现每40秒左右就会有20多个线程退出,然后创建线程。 I then switched the service's application pool from .NET Framework Version v4.0 to v2.0 and started the service again; 然后,我将该服务的应用程序池从.NET Framework版本v4.0切换到了v2.0,然后再次启动该服务。 the handle count didn't move from approximately 500 open handles for the entire hour. 整个小时中,手柄数并未从大约500个打开的手柄中移出。 I was not able to reproduce this issue on several of my machines (not at Amazon). 我无法在多台机器上重现此问题(不是在Amazon上)。 I'm aware that there were significant thread pool changes in CLR 4.0 - http://msdn.microsoft.com/en-us/magazine/ff960958.aspx , but I do not know why I'm seeing 1) bursts of thread creation activity with no client requests or work being performed by the service and 2) why the thread handles and associated event handles are not being released. 我知道CLR 4.0中有显着的线程池更改-http: //msdn.microsoft.com/zh-cn/magazine/ff960958.aspx ,但我不知道为什么会看到1)线程爆发没有客户端请求或服务未执行工作的创建活动; 2)为什么未释放线程句柄和关联的事件句柄。

I recently ran into this issue with a .NET service (hosted in IIS on Server 2012 R2 with .NET 4.5.1). 我最近遇到了一个.NET服务(托管在带有.NET 4.5.1的Server 2012 R2上的IIS中)的问题。 While sitting idle, it would accumulate >30,000 handles. 闲置时,它将积累> 30,000个手柄。 Using !htrace in WinDbg I could see the handles all being created in this stack: 在WinDbg中使用!htrace ,我可以看到所有在该堆栈中创建的句柄:

Call Site
clr!Thread::CreateNewOSThread+0x7f
clr!Thread::CreateNewThread+0x90
clr!ThreadpoolMgr::CreateUnimpersonatedThread+0xc7
clr!ThreadpoolMgr::MaybeAddWorkingWorker+0x113
clr!ManagedPerAppDomainTPCount::SetAppDomainRequestsActive+0x24
clr!ThreadpoolMgr::SetAppDomainRequestsActive+0x2a
clr!ThreadPoolNative::RequestWorkerThread+0x2b
mscorlib_ni!System.Threading.ThreadPoolWorkQueue.Dispatch()
mscorlib_ni![ContextTransitionFrame: 0000002b15e4ef28] 
clr!CallDescrWorkerInternal+0x83
clr!CallDescrWorkerWithHandler+0x4a
clr!MethodDescCallSite::CallTargetWorker+0x380
clr!QueueUserWorkItemManagedCallback+0x2a
clr!ManagedThreadBase_DispatchInner+0x2d
clr!ManagedThreadBase_DispatchMiddle+0x6c
clr!ManagedThreadBase_DispatchOuter+0x75
clr!ManagedThreadBase_DispatchInCorrectAD+0x15
clr!Thread::DoADCallBack+0x25b
clr!ManagedThreadBase_DispatchInner+0x69
clr!ManagedThreadBase_DispatchMiddle+0x6c
clr!ManagedThreadBase_DispatchOuter+0x75
clr!ManagedThreadBase_FullTransitionWithAD+0x2f
clr!ManagedPerAppDomainTPCount::DispatchWorkItem+0xe3
clr!ThreadpoolMgr::ExecuteWorkRequest+0x64
clr!ThreadpoolMgr::WorkerThreadStart+0x2b6
clr!Thread::intermediateThreadProc+0x7d
KERNEL32!BaseThreadInitThunk+0xd
ntdll!RtlUserThreadStart+0x1d

Each call to CreateNewOSThread would create 1 Thread handle and 4 Event handles, which were not being cleaned up (the thread would finish running but the handles would stick around). 每次对CreateNewOSThread的调用都将创建1个线程句柄和4个事件句柄,这些句柄没有被清理(线程将完成运行,但这些句柄会一直存在)。 I never tracked down what was adding tasks to the thread pool, but what I did notice was that since the service was "idle" the GC was never being called. 我从未跟踪过向线程池添加任务的内容,但是我注意到的是,由于服务是“空闲”的,因此从未调用过GC。

For some reason the thread pool manager doesn't dispose the handles when the worker thread is allowed to exit, and instead relies on the garbage collector to do it. 出于某种原因,当允许工作线程退出时,线程池管理器不会释放该句柄,而是依靠垃圾回收器来完成。

As a test, I added a method to manually invoke the garbage collector on the service. 作为测试,我添加了一种方法来手动调用服务上的垃圾收集器。 After observing the linear increase in handles, I kicked a GC on the service and watched the handle counts drop back down to normal levels. 观察到句柄的线性增加后,我在服务上启动了GC,并观察到句柄数下降到正常水平。

In a w3wp.exe instance hosting one WCF service, there exist at least 3 AppDomains in .NET 4, one is called SharedAppDomains which include over 20 .net framework assemblies, and the other called Default, and the last one is named something like /LM/W3Svc.... some funky name, which contains your WCF application assemblies as well as some direct dependencies. 在托管一个WCF服务的w3wp.exe实例中,.NET 4中至少存在3个AppDomain,一个称为SharedAppDomains,其中包括20多个.net框架程序集,另一个称为Default,最后一个名称类似于/ LM / W3Svc ....一些时髦的名称,其中包含您的WCF应用程序程序集以及一些直接依赖项。 What tool told you that there's only one app domain not containing other assemblies? 哪种工具告诉您只有一个应用程序域不包含其他程序集? The simplest way is to run Process Explorer as Administrator and check the .NET Assemblies of the w3wp.exe instance. 最简单的方法是以管理员身份运行Process Explorer 然后检查w3wp.exe实例的.NET程序集。

Nevertheless, even if your WCF is running idle without responding to coming requests, w3wp.exe is not running idle since it is a hosting process responsible for many house keeping tasks. 但是,即使WCF处于空闲状态而不响应即将到来的请求,w3wp.exe也不会处于空闲状态,因为w3wp.exe是一个托管进程,负责执行许多内部任务。 In my Hello World WCF service on .NET 4.5.1 on .net 4 app pool of IIS 7 on Windows 7, w3wp.exe has the thread count jumping between 44-47. 在Windows 7上IIS 7的.net 4应用程序池上的.NET 4.5.1上的Hello World WCF服务中,w3wp.exe的线程数在44-47之间跳跃。 Memory usage as well as other resource figures are basically stable. 内存使用以及其他资源数字基本稳定。

You mentioned the problem only occurred on AWS machines, not your other machines. 您提到的问题仅发生在AWS机器上,而不是其他机器上。 So you have better to find out all loaded app domains and their assemblies through running Process Explorer as Administrator , and compare the list of the w3wp.exe instance on your own PC, and single out some usual suspects that could do more works than expected. 因此,您最好通过以Administrator身份运行Process Explorer来查找所有已加载的应用程序域及其程序集,并在您自己的PC上比较w3wp.exe实例的列表,并找出一些可能比预期做得更多的常见嫌疑人。 Of course, it could be that w3wp.exe got compromised and doing doggy things, however, at this stage, just check assemblies and app domains first. 当然,这可能是w3wp.exe受到威胁并正在做一些杂事,但是,在此阶段,只需先检查程序集和应用程序域即可。 This is not an answer, however, the comment area of SO has restriction on the length of comment. 这不是答案,但是,SO的注释区域对注释的长度有所限制。 So hopefully this is somewhere to start to check things. 因此,希望这是开始检查问题的地方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM