简体   繁体   中英

Service occasionally hangs when stopping: suspended threads

I wrote a Windows service in C# targeting .NET 4.0 which will on the odd occasion hang completely when I attempt to stop the service. I've noticed from looking at a dump file that a number of my threads are suspended, though I don't suspend them myself in my code.

The environment is Windows Server 2008R2 64bit, though I've observed the same hang on Windows 7 64bit. .NET 4.0 is the latest version installed.

There's a lot of code so I'm just posting some hopefully relevant snippets, I can post more if required.

Basic design:

Main() starts a new thread to handle logging to a file (the code for which is in a separate dll), then starts the service.

public static void Main(string[] args)
{
    ...
    else if (Args.RunService)
    {
        Logger.Options.LogToFile = true;
        MSPO.Logging.Logger.Start();
        RunService();
        MSPO.Logging.Logger.Stop();
    }
    ...
}

private static void RunService()
{
    service = new ProcessThrottlerService();
    System.ServiceProcess.ServiceBase.Run(service);
}

That thread stays there, until ServiceBase.Run returns.

OnStart() in the service creates a new thread and starts it.

protected override void OnStart(string[] args)
{
    serviceThread = new MainServiceThread();
    serviceThread.StartThread();
    base.OnStart(args);
}

I create a ManualResetEventSlim which is used as the stop signal for the rest of the program. OnStop() sets the event.

protected override void OnStop()
{
    if (serviceThread != null)
    {
        serviceThread.StopThread(); // Event is signalled in there
        serviceThread.WaitForThreadToReturn(); // This calls thread.Join() on the MainServiceThread thread
    }
    base.OnStop();
}

The "MainServiceThread" creates the event, kicks off a new thread again, and just waits on the event.

private void StartHandlerAndWaitForServiceStop()
{
    processHandler.Start(serviceStopEvent);
    serviceStopEvent.Wait();
    processHandler.Stop();
}

The processHandler thread subscribes to this WMI query:

watcher = new ManagementEventWatcher(new ManagementScope("root\\CIMV2"),
    new WqlEventQuery("SELECT * FROM Win32_ProcessStartTrace"));
watcher.EventArrived += HandleNewProcessCreated;

If the new process name is of interest, I create a new "throttler" thread which effectively just suspends the process, sleeps, resumes the process, and sleeps again, on a loop:

while (true)
{
    ntresult = Ntdll.NtResumeProcess(processHandle);
    if (ntresult != Ntdll.NTSTATUS.STATUS_SUCCESS)
    {
        if (ntresult != Ntdll.NTSTATUS.STATUS_PROCESS_IS_TERMINATING)
            LogSuspendResumeFailure("resume", ntresult);
        break;
    }
    Thread.Sleep(resumeTime);

    ntresult = Ntdll.NtSuspendProcess(processHandle);
    if (ntresult != Ntdll.NTSTATUS.STATUS_SUCCESS)
    {
        if (ntresult != Ntdll.NTSTATUS.STATUS_PROCESS_IS_TERMINATING)
            LogSuspendResumeFailure("suspend", ntresult);
        break;
    }
    Thread.Sleep(suspendTime);

    if (++loop >= loopsBeforeCheckingStopEvent)
    {
        if (stopEvent.IsSet) break;
        loop = 0;
    }
}

If the service receives a stop command, it will set the ManualResetEventSlim event. Any threads "throttling" processes will see it within 1 second and break out of the loop/return. The process handler thread will wait on all of those threads to return, and then return as well. At that point the StartHandlerAndWaitForServiceStop() method posted above will return, and the other threads that have been waiting here and there return.

The vast majority of the times I've stopped the service, it stops without any problems. This is regardless of whether I've got 0 or 500 throttler threads running, and regardless of whether any have ever been created while the service was running.

However now and again when I try to stop it (through services.msc), it will hang. Yesterday I managed to create a full dump of the process while it was in this state. I created the dump with Process Explorer.

The dump file shows that a number of my threads are suspended:

0:010> ~
   0  Id: 1840.c34 Suspend: 0 Teb: 000007ff`fffdd000 Unfrozen
   1  Id: 1840.548 Suspend: 0 Teb: 000007ff`fffdb000 Unfrozen
   2  Id: 1840.9c0 Suspend: 0 Teb: 000007ff`fffd9000 Unfrozen
   3  Id: 1840.1da8 Suspend: 0 Teb: 000007ff`fffd7000 Unfrozen
   4  Id: 1840.b08 Suspend: 3 Teb: 000007ff`fffd5000 Unfrozen
   5  Id: 1840.1b5c Suspend: 0 Teb: 000007ff`ffef6000 Unfrozen
   6  Id: 1840.af0 Suspend: 2 Teb: 000007ff`ffef2000 Unfrozen
   7  Id: 1840.c60 Suspend: 0 Teb: 000007ff`ffef0000 Unfrozen
   8  Id: 1840.1d94 Suspend: 4 Teb: 000007ff`ffeee000 Unfrozen
   9  Id: 1840.1cd8 Suspend: 4 Teb: 000007ff`ffeec000 Unfrozen
. 10  Id: 1840.1c64 Suspend: 0 Teb: 000007ff`ffefa000 Unfrozen
  11  Id: 1840.1dc8 Suspend: 0 Teb: 000007ff`fffd3000 Unfrozen
  12  Id: 1840.8f4 Suspend: 0 Teb: 000007ff`ffefe000 Unfrozen

This ties up with what I was seeing in Process Explorer - of the two processes I was "throttling", one was permanently suspended, the other was permanently resumed. So those throttler threads were effectively suspended, as they were no longer doing their work. It should be impossible for them to stop without being suspended, as I have error handling wrapped around it and any exception would cause those threads to log info and return. Plus their call stacks showed no errors. They weren't sleeping permanently due to some error, because the sleep times were 22 and 78 milliseconds for each of the two sleeps, and it was working fine before I tried to stop the service.

So I'm trying to understand how those threads could have become suspended. My only suspicion is the GC, cause that suspends threads while reclaiming/compacting memory.

I've pasted the content of !eestack and ~*kb here: http://pastebin.com/rfQK0Ak8

I should mention I didn't have symbols, as I'd already rebuilt the application a number of times by the time I created the dump. However as it's .NET I guess it's less of an issue?

From eestack, these are what I believe are "my" threads:

  • Thread 0: Main service thread, it's still in the ServiceBase.Run method.
  • Thread 4: That is my logger thread. That thread will spend most of its life waiting on a blocking queue.
  • Thread 6: My MainServiceThread thread, which is just waiting on the event to be set.
  • Threads 8 & 9: Both are "throttler" thread, executing the loop I posted above.
  • Thread 10: This thread appears to be executing the OnStop() method, so is handling the service stop command.

That's it, and threads 4, 6, 8, and 9 are suspended according to the dump file. So all "my" threads are suspended, apart from the main thread and the thread handling the OnStop() method.

Now I don't know much about the GC and debugging .NET stuff, but thread 10 looks dodgy to me. Excerpt from call stack:

Thread  10
Current frame: ntdll!NtWaitForMultipleObjects+0xa
Child-SP         RetAddr          Caller, Callee
000000001a83d670 000007fefdd41420 KERNELBASE!WaitForMultipleObjectsEx+0xe8, calling ntdll!NtWaitForMultipleObjects
000000001a83d6a0 000007fef4dc3d7c clr!CExecutionEngine::ClrVirtualAlloc+0x3c, calling kernel32!VirtualAllocStub
000000001a83d700 000007fefdd419bc KERNELBASE!WaitForMultipleObjectsEx+0x224, calling ntdll!RtlActivateActivationContextUnsafeFast
000000001a83d710 000007fef4e9d3aa clr!WKS::gc_heap::grow_heap_segment+0xca, calling clr!StressLog::LogOn
000000001a83d730 000007fef4e9cc98 clr!WKS::gc_heap::adjust_limit_clr+0xec, calling clr!memset
000000001a83d740 000007fef4df398d clr!COMNumber::FormatInt32+0x8d, calling clr!LazyMachStateCaptureState
000000001a83d750 000007fef4df398d clr!COMNumber::FormatInt32+0x8d, calling clr!LazyMachStateCaptureState
000000001a83d770 00000000778a16d3 kernel32!WaitForMultipleObjectsExImplementation+0xb3, calling kernel32!WaitForMultipleObjectsEx
000000001a83d7d0 000007fef4e9ce73 clr!WKS::gc_heap::allocate_small+0x158, calling clr!WKS::gc_heap::a_fit_segment_end_p
000000001a83d800 000007fef4f8f8e1 clr!WaitForMultipleObjectsEx_SO_TOLERANT+0x91, calling kernel32!WaitForMultipleObjectsExImplementation
000000001a83d830 000007fef4dfb798 clr!Thread::GetApartment+0x34, calling clr!GetThread
000000001a83d860 000007fef4f8f6ed clr!Thread::GetFinalApartment+0x1a, calling clr!Thread::GetApartment
000000001a83d890 000007fef4f8f6ba clr!Thread::DoAppropriateAptStateWait+0x56, calling clr!WaitForMultipleObjectsEx_SO_TOLERANT
000000001a83d8d0 000007fef4f8f545 clr!Thread::DoAppropriateWaitWorker+0x1b1, calling clr!Thread::DoAppropriateAptStateWait
000000001a83d990 000007fef4ecf167 clr!ObjectNative::Pulse+0x147, calling clr!HelperMethodFrameRestoreState
000000001a83d9d0 000007fef4f8f63b clr!Thread::DoAppropriateWait+0x73, calling clr!Thread::DoAppropriateWaitWorker
000000001a83da50 000007fef4f0ff6a clr!Thread::JoinEx+0xa6, calling clr!Thread::DoAppropriateWait
000000001a83dac0 000007fef4defd90 clr!GCHolderBase<0,0,0,0>::EnterInternal+0x3c, calling clr!Thread::EnablePreemptiveGC
000000001a83daf0 000007fef4f1039a clr!ThreadNative::DoJoin+0xd8, calling clr!Thread::JoinEx
000000001a83db20 000007fef45f86f3 (MethodDesc 000007fef3cbe8d8 +0x1a3 System.Threading.SemaphoreSlim.Release(Int32)), calling 000007fef4dc31b0 (stub for System.Threading.Monitor.Exit(System.Object))
000000001a83db60 000007fef4dfb2a6 clr!FrameWithCookie<HelperMethodFrame_1OBJ>::FrameWithCookie<HelperMethodFrame_1OBJ>+0x36, calling clr!GetThread
000000001a83db90 000007fef4f1024d clr!ThreadNative::Join+0xfd, calling clr!ThreadNative::DoJoin
000000001a83dc40 000007ff001723f5 (MethodDesc 000007ff001612c0 +0x85 MSPO.Logging.MessageQueue.EnqueueMessage(System.String)), calling (MethodDesc 000007fef30fde88 +0 System.Collections.Concurrent.BlockingCollection`1[[System.__Canon, mscorlib]].TryAddWithNoTimeValidation(System.__Canon, Int32, System.Threading.CancellationToken))
000000001a83dcf0 000007ff001720e9 (MethodDesc 000007ff00044bb0 +0xc9 ProcessThrottler.Logging.Logger.Log(LogLevel, System.String)), calling (MethodDesc 000007ff00161178 +0 MSPO.Logging.MessageFormatter.QueueFormattedOutput(System.String, System.String))
000000001a83dd10 000007fef4f101aa clr!ThreadNative::Join+0x5a, calling clr!LazyMachStateCaptureState
000000001a83dd30 000007ff0018000b (MethodDesc 000007ff00163e10 +0x3b ProcessThrottler.Service.MainServiceThread.WaitForThreadToReturn()), calling 000007fef4f10150 (stub for System.Threading.Thread.JoinInternal())
000000001a83dd60 000007ff0017ff44 (MethodDesc 000007ff00049f30 +0xc4 ProcessThrottler.Service.ProcessThrottlerService.OnStop()), calling 000007ff0004d278 (stub for ProcessThrottler.Service.MainServiceThread.WaitForThreadToReturn())
000000001a83dda0 000007fef63fcefb (MethodDesc 000007fef63d65e0 +0xbb System.ServiceProcess.ServiceBase.DeferredStop())

I could post more code showing what each of my functions is doing, but I really don't think this is a deadlock in my code, as the threads would not become suspended in that case. So I'm looking at the above call stack and seeing it's doing some GC stuff after I tell it to log a string to a queue. But none of that GC stuff looks dodgy, at least not compared to what I'm seeing in http://blogs.msdn.com/b/tess/archive/2008/02/11/hang-caused-by-gc-xml-deadlock.aspx I have a config file to tell it to use gcServer, but I'm almost certain it's not using that setting because in my earlier testing GCSettings.IsServerGC always returned false.

So... does anyone have any suggestions as to why my threads are suspended?

This is my OpenProcess method BTW which gets the handle to the process to be suspended/resumed, in response to Hans's comment:

private void GetProcessHandle(CurrentProcessDetails process)
{
    IntPtr handle = Kernel32.OpenProcess(
        process.Settings.RequiredProcessAccessRights,
        false,
        (uint)process.ID
        );
    if (handle == IntPtr.Zero)
        throw new Win32ExceptionWrapper(
            string.Format("Failed to open process {0} {1}", 
            process.Settings.ProcessNameWithExt, process.IDString));
    process.Handle = handle;
}

I've discovered the cause. It has nothing to do with my code. It's a bug in Process Explorer.

My program is written to target .NET 4.0. If I use Process Explorer to view any of my threads' call stacks, Process Explorer suspends the thread and doesn't resume it. What it should do is suspend the thread while it gets the call stack, and then resume it immediately. But it's not resuming the threads - not my managed threads, anyway.

I can replicate it with this very simple code:

using System;

namespace Test
{
    class Program
    {
        static void Main(string[] args)
        {
            for (int i = 0; i < int.MaxValue; i++)
            {
                Console.WriteLine(i.ToString());
            }
        }   
    }
}

If I compile that to target .NET 4.0 or higher, run it, and use Process Explorer to open the thread running the loop, the thread will become suspended. The resume button will become available, and I can click it to resume the thread. Opening the thread multiple times results in it being suspended multiple times; I confirmed this by using Windbg to view the suspend count of the thread.

If I compile it to target versions below 4.0 (tried 2.0 and 3.5), threads I open in Process Explorer do not remain suspended.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM