简体   繁体   中英

A lot of hung requests in the IIS Worker Process

The web API hosted in IIS has unexpected behavior after one of the releases:

iis_connections

These are the metrics collected using the IIS Web Service(_Total) Current Connections counter:

A connection is a active session on your server. When someone connects, it increments the counter, when they disconnect, it goes down.

At what point in time everything works stably, but then the number of connections exceeds the 5,000+- connection treshhold and the API gives a 503.2 error.

From API:

The serverRuntime@appConcurrentRequestLimit setting is being exceeded.

From event logs:

A worker process '5352' serving application pool '%placeholder%' failed to stop a listener channel for protocol 'http' in the allotted time. The data field contains the error number.

Restarting solves this problem, but it is not a long-term solution.

挂起的请求

I have full dump of application pool, but I not found any problems: deadlocks or something like that.I do not have much experience with, so I used command such as: !crlstack , !dupmstack , !dkl , !runaway , !threadpool and so on for analysis.

There are not a lot of busy workers or completition threads, everything looks goodб even CPU utilization and corresponding metrics. I also tried to use the Debug Diagnostic Tool for analysis, but to no avail, it showed almost the same thing that I already saw when I used WinDbg .

What could cause the problem? How to compare the symptoms? What tools to use or maybe try to perform some separate manipulations with the already mentioned ones? I understand that the problem is in the application, but I can not diagnose it.

I am confused and do not know the direction of further analysis. If my question is offtopic for this site I can move it.


UPD .

Especially if you are using HttpClient. check this one here. You might even be exhausting ports.

I know such a problem, but here is another case. The component with which the problem occurs is the last in the call chain:

Client <-> API <-> DB


From your version control, find the changed calls to limit the investigation area and find where the new problem stems from.

Already, but unfortunately unsuccessfully. I can simply roll back task by task, but I believe that this is not a good solution, because I need to have an idea about the diagnosis of such problems.


List of managed threads:

0:000> !runaway
 User Mode Time
  Thread       Time
   48:ea8      0 days 0:00:25.187
   50:18ec     0 days 0:00:23.171
   49:1b1c     0 days 0:00:22.593
   52:5c4      0 days 0:00:22.562
   51:1bd8     0 days 0:00:22.312
   45:109c     0 days 0:00:22.187
   47:152c     0 days 0:00:22.078
   46:1988     0 days 0:00:20.859
   56:2b8      0 days 0:00:17.078
   26:1f40     0 days 0:00:16.281
   24:140c     0 days 0:00:16.265
   27:17c8     0 days 0:00:16.187
   22:181c     0 days 0:00:16.109
   25:1f88     0 days 0:00:16.031
   23:9fc      0 days 0:00:15.968
   20:10ec     0 days 0:00:15.765
   21:1f74     0 days 0:00:15.750
   57:ff8      0 days 0:00:12.390
    9:1ef4     0 days 0:00:04.734
   59:1b8c     0 days 0:00:04.375
    7:18b4     0 days 0:00:04.187
    6:1798     0 days 0:00:04.000
    4:1ac0     0 days 0:00:03.671
   10:13d4     0 days 0:00:03.484
    8:1f70     0 days 0:00:03.203
   55:434      0 days 0:00:03.171
   60:1e34     0 days 0:00:03.031
    5:1f0      0 days 0:00:02.468
   44:16d0     0 days 0:00:02.203
   61:bd4      0 days 0:00:02.156
   40:1c34     0 days 0:00:02.031
   43:177c     0 days 0:00:02.000
   38:1b5c     0 days 0:00:01.890
   36:2210     0 days 0:00:01.890
    3:2264     0 days 0:00:01.796
   39:1e5c     0 days 0:00:01.765
   34:1ea8     0 days 0:00:01.734
   62:16ac     0 days 0:00:01.718
   37:10a4     0 days 0:00:01.609
   42:2028     0 days 0:00:01.593
   35:10b4     0 days 0:00:01.515
   41:187c     0 days 0:00:01.453
   64:1764     0 days 0:00:00.703
   65:124c     0 days 0:00:00.593
   63:13a0     0 days 0:00:00.453
   58:1a9c     0 days 0:00:00.421
   70:14dc     0 days 0:00:00.406
   54:8d8      0 days 0:00:00.390
   69:704      0 days 0:00:00.265
   66:1d0c     0 days 0:00:00.156
   28:2120     0 days 0:00:00.140
   72:18c4     0 days 0:00:00.109
   73:b40      0 days 0:00:00.015
    0:1330     0 days 0:00:00.015
   77:50c      0 days 0:00:00.000
   76:1840     0 days 0:00:00.000
   75:1614     0 days 0:00:00.000
   74:1c1c     0 days 0:00:00.000
   71:824      0 days 0:00:00.000
   68:18e8     0 days 0:00:00.000
   67:1518     0 days 0:00:00.000
   53:1ed4     0 days 0:00:00.000
   33:1838     0 days 0:00:00.000
   32:1e6c     0 days 0:00:00.000
   31:1a40     0 days 0:00:00.000
   30:608      0 days 0:00:00.000
   29:2e0      0 days 0:00:00.000
   19:176c     0 days 0:00:00.000
   18:1fa0     0 days 0:00:00.000
   17:1394     0 days 0:00:00.000
   16:14f0     0 days 0:00:00.000
   15:13cc     0 days 0:00:00.000
   14:5d0      0 days 0:00:00.000
   13:944      0 days 0:00:00.000
   12:4f8      0 days 0:00:00.000
   11:2360     0 days 0:00:00.000
    2:310      0 days 0:00:00.000
    1:fe8      0 days 0:00:00.000

Output of !threadpool :

0:000> !threadpool
CPU utilization: 14%
Worker Thread: Total: 11 Running: 0 Idle: 9 MaxLimit: 32767 MinLimit: 8
Work Request in Queue: 0
--------------------------------------
Number of Timers: 2
--------------------------------------
Completion Port Thread:Total: 7 Free: 7 MaxFree: 16 CurrentLimit: 7 MaxLimit: 1000 MinLimit: 8

All of them have Http Status: 200 (NULL). Not completed Http Status: 200 (NULL). Not completed . Unfortunately, there are no records associated with any threads.

0:000> !whttp /running
HttpContext    Thread Time Out Running  Status Verb
0000029e1953fb58   -- 00:01:50 00:14:22    200 GET       
0000029e19ce1b58   -- 00:01:50 00:18:38    200 GET       
0000029e19dabc00   -- 00:01:50 00:16:08    200 GET      
0000029e19db28c8   -- 00:01:50 00:14:15    200 GET       
0000029e19db9898   -- 00:01:50 00:10:51    200 GET       
0000029e19dc9cf8   -- 00:01:50 00:18:38    200 GET       
0000029e19de4188   -- 00:01:50 00:10:51    200 GET       
0000029e19e48350   -- 00:01:50 00:10:45    200 GET       
0000029e19ea3428   -- 00:01:50 00:18:05    200 GET       
0000029e19eaab88   -- 00:01:50 00:10:45    200 GET       
0000029e19ec91a0   -- 00:01:50 00:10:44    200 GET       
0000029e19f74e30   -- 00:01:50 00:10:39    200 GET       
0000029e19fa8ca8   -- 00:01:50 00:10:39    200 GET       
0000029e19fe56d8   -- 00:01:50 00:10:50    200 GET       
0000029e19ffa778   -- 00:01:50 00:14:21    200 GET       
0000029e1a0b6088   -- 00:01:50 00:10:38    200 GET       
0000029e1a12b040   -- 00:01:50 00:10:38    200 GET       
0000029e1a16cd50   -- 00:01:50 00:10:37    200 GET       
0000029e1a2b22e0   -- 00:01:50 00:19:18    200 GET       
0000029e1a2cf618   -- 00:01:50 00:19:27    200 GET       
0000029e1a2f3620   -- 00:01:50 00:19:18    200 GET       
0000029e1a2ff808   -- 00:01:50 00:19:18    200 GET       
0000029e1a30aa20   -- 00:01:50 00:19:22    200 GET       
0000029e1a314b98   -- 00:01:50 00:19:22    200 GET       
0000029e1a3352a0   -- 00:01:50 00:19:17    200 GET       
0000029e1a34e6f8   -- 00:01:50 00:18:38    200 GET       
0000029e1a353248   -- 00:01:50 00:19:10    200 GET       
0000029e1a371260   -- 00:01:50 00:19:20    200 GET       
0000029e1a39f800   -- 00:01:50 00:18:37    200 GET       
0000029e1a3b32e8   -- 00:01:50 00:18:36    200 GET       
0000029e1a3d18b8   -- 00:01:50 00:18:03    200 GET       
0000029e1a3d6f40   -- 00:01:50 00:18:22    200 GET       
0000029e1a3e2670   -- 00:01:50 00:19:14    200 GET       
0000029e1a404510   -- 00:01:50 00:18:53    200 GET       
0000029e1a413eb8   -- 00:01:50 00:18:38    200 GET       
0000029e1a416a48   -- 00:01:50 00:18:53    200 GET       
0000029e1a41c888   -- 00:01:50 00:19:13    200 GET       
0000029e1a4288c0   -- 00:01:50 00:19:13    200 GET       
0000029e1a442118   -- 00:01:50 00:19:13    200 GET       
0000029e1a48b098   -- 00:01:50 00:18:53    200 GET       
0000029e1a51edc0   -- 00:01:50 00:14:15    200 GET       
0000029e1a52a420   -- 00:01:50 00:18:04    200 GET       
0000029e1a55bb48   -- 00:01:50 00:19:12    200 GET      

In this case, there are no request queues, high threads contention or something like this, just black hole and spikes by IIS connections.


I noticed that the generational size of GC is increasing at the moment the problem starts to arise. Maybe I can get something by analyzing a bunch.

gc代

This answer can be divided into two parts:

  • [Part 1] Finding the cause
  • [Part 2] Finding the culprit

[Part 1]

It is noticed that the size of generations increases at the moment when the problem begins to arise. See the screenshot from the question. Therefore, we can try to analyze the heap for anomalies that will help us find the cause of the problem.

WinDbg command output has been shortened to fit the symbols limit.


Having the full dump of the process, the list of HTTP requests that were not executed at the time of the problem was retrieved:

0:000> !aspxpagesext -n -nq
   Address          Completed Timeout Time (secs) ThreadId ReturnCode Verb
AS 00000284fcc095c8        no     110        4870                 200 GET 
AS 00000286000192a8        no     110        1858                 200 GET 
AS 00000286000242e0        no     110        1854                 200 GET 
AS 000002860002ff78        no     110        1851                 200 GET 
AS 00000286000433c8        no     110        1829                 200 GET 
AS 00000286000527d8        no     110        1824                 200 GET 
AS 000002860006caa0        no     110        1074                 200 GET 
AS 00000286004d26c0        no     110        1561                 200 GET 
AS 00000286005386f0        no     110        1557                 200 GET 
AS 00000286005d5090        no     110        1546                 200 GET 
AS 00000286005edb68        no     110        1381                 200 GET 
AS 0000028600603290        no     110        1539                 200 GET 
AS 0000028600604818        no     110        1539                 200 GET 
AS 000002860062b640        no     110        1537                 200 GET 
AS 0000028600636f10        no     110        1369                 200 GET 
AS 0000028600642a08        no     110        1536                 200 GET 
AS 000002860065bbc8        no     110        1346                 200 GET 
AS 0000028600666a30        no     110        1339                 200 GET 
AS 0000028600673748        no     110        1332                 200 GET 
AS 000002860067fb28        no     110        1327                 200 GET 
AS 000002860068e000        no     110        1322                 200 GET 
AS 00000286006a1450        no     110        1320                 200 GET 
AS 00000286006b0418        no     110        1320                 200 GET 
AS 00000286006bfc48        no     110        1318                 200 GET 
AS 00000286006cf7e8        no     110        1317                 200 GET 
AS 00000286006e2dd8        no     110        1311                 200 GET 
AS 00000286006ee748        no     110        1309                 200 GET 
AS 00000286006fb200        no     110        1308                 200 GET 
AS 000002860070c870        no     110        1288                 200 GET 
AS 000002880224d568        no     110          13                 200 GET 
3038 contexts found (2815 displayed).
   AS == possibly async request
   SL == session locked

No one request has an associated thread, so we cannot look inside, but we can find all the roots that keep the request objects from garbage collection.


000002880224d568 is the address of one of the last HttpContext that the server processed for 13 seconds. The analysis of other requests showed the similar result:

0:000> !gcroot 000002880224d568
Thread a38:
    000000a31747ed00 00007ffc85d931d3 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
        rbp-80: 000000a31747ed40
            ->  00000287fc9157e0 System.Threading.Thread
            ->  00000285fc8c8438 System.Runtime.Remoting.Contexts.Context
            ->  00000285fc8c8150 System.AppDomain
            ->  00000285fcc1f9d8 System.EventHandler
            ->  00000284fc9205c8 System.Object[]
            ->  00000285fc8d6ab8 System.EventHandler
            ->  00000285fc8d6530 System.Web.Hosting.HostingEnvironment
            ->  00000285fc8d66d8 System.Collections.Hashtable
            ->  00000285fc8d6728 System.Collections.Hashtable+bucket[]
            ->  00000285fc94fb08 System.Web.Hosting.ApplicationMonitors+AppMonitorRegisteredObject
            ->  00000285fc94d9b8 System.Web.Hosting.ApplicationMonitors
            ->  00000285fc94d9d0 System.Web.Hosting.AspNetMemoryMonitor
            ->  00000285fc94f040 System.Web.Hosting.LowPhysicalMemoryMonitor
            ->  00000285fc94f318 System.Threading.Timer
            ->  00000285fc94f390 System.Threading.TimerHolder
            ->  00000285fc94f338 System.Threading.TimerQueueTimer
            ->  00000285fc94d0c0 System.Threading.TimerQueueTimer
            ->  00000285fc8e47f8 System.Threading.TimerQueueTimer
            ->  00000285fc8e4710 System.Threading.TimerQueueTimer
            ->  00000285fc8e4628 System.Threading.TimerQueueTimer
            ->  00000285fc8e4540 System.Threading.TimerQueueTimer
            ->  00000285fc8e43f8 System.Threading.TimerQueueTimer
            ->  00000285fc8db808 System.Threading.TimerQueueTimer
            ->  00000285fc8db7a8 System.Threading.TimerCallback
            ->  00000285fc8db4e8 System.Web.RequestTimeoutManager
            ->  00000285fc8db520 System.Web.Util.DoubleLinkList[]
            ->  00000285fc8db690 System.Web.Util.DoubleLinkList
            ->  00000286fcaca7b0 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  00000286fcddae98 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  00000284fce820f8 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  00000287fcfd1ac8 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  00000287fd075278 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  0000028801784ed8 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  0000028700fb7f20 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  00000285fec96f00 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  000002860076f5f8 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  000002860162ef58 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  0000028601d98e90 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  000002850246e118 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  000002880224ef58 System.Web.RequestTimeoutManager+RequestTimeoutEntry
            ->  000002880224d568 System.Web.HttpContext

HandleTable:
    00000284fbca15e0 (pinned handle)
    -> 00000288fc893f30 System.Object[]
    -> 00000284fc911968 NamespaceClientApi.Models.Helpers.KeyedSemaphoreSlim
    -> 00000284fc9119a8 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[NamespaceClientApi.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper, Project.Library.SomeClientApiModels]]
    -> 00000287ff97efc0 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[NamespaceClientApi.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper, Project.Library.SomeClientApiModels]][]
    -> 00000285fd3a4880 NamespaceClientApi.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper
    -> 00000285fd3a48b0 System.Threading.SemaphoreSlim
    -> 00000285fd3a8698 System.Threading.SemaphoreSlim+TaskNode
    -> 00000285fd3b5168 System.Threading.SemaphoreSlim+TaskNode
    -> 00000287fcc17908 System.Threading.SemaphoreSlim+TaskNode
    -> 00000284fce05cf0 System.Threading.SemaphoreSlim+TaskNode
    -> 00000287fcc37988 System.Threading.SemaphoreSlim+TaskNode
    -> 00000285fd3cd0e8 System.Threading.SemaphoreSlim+TaskNode
    -> 00000286fcdbf498 System.Threading.SemaphoreSlim+TaskNode
    -> 00000286fcaab498 System.Threading.SemaphoreSlim+TaskNode
    -> 00000287fcd16370 System.Threading.SemaphoreSlim+TaskNode
    -> 00000285fd3ee190 System.Threading.SemaphoreSlim+TaskNode
    -> 0000028502949e58 System.Action
    -> 0000028502949e38 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner
    -> 0000028502949ee8 NamespaceClientApi.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper+<WaitAsync>d__8
    -> 0000028502949e98 System.Threading.Tasks.Task`1[[System.IDisposable, mscorlib]]
    -> 000002850294a138 System.Threading.Tasks.SynchronizationContextAwaitTaskContinuation
    -> 000002850294a038 System.Action
    -> 000002850294a018 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner
    -> 000002850294a0c8 NamespaceClientApi.Models.Helpers.MemoryLocalCache+<GetCacheItemAsync>d__9`1[[System.Collections.Generic.HashSet`1[[System.ValueTuple`2[[System.Int64, mscorlib],[System.Int32, mscorlib]], mscorlib]], System.Core]]
    -> 000002850294a078 System.Threading.Tasks.Task`1[[System.Collections.Generic.HashSet`1[[System.ValueTuple`2[[System.Int64, mscorlib],[System.Int32, mscorlib]], mscorlib]], System.Core]]
    -> 000002850294a338 System.Threading.Tasks.SynchronizationContextAwaitTaskContinuation
    -> 000002850294a268 System.Action
    -> 000002850294a248 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner
    -> 000002850294a2f8 Namespace.Services.ProjectInfo+<EventFlagsAsync>d__88
    -> 000002850294a2a8 System.Threading.Tasks.Task`1[[System.Collections.Generic.HashSet`1[[System.ValueTuple`2[[System.Int64, mscorlib],[System.Int32, mscorlib]], mscorlib]], System.Core]]
    -> 000002850294a540 System.Threading.Tasks.SynchronizationContextAwaitTaskContinuation
    -> 000002850294a468 System.Action
    -> 000002850294a448 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner
    -> 000002850294a4f8 Namespace.Filters.EventPropertiesFilter+<SetEventFlags>d__2
    -> 000002850294a4a8 System.Threading.Tasks.Task`1[[Namespace.Filters.EventPropertiesFilter, Project.Library.Some]]
    -> 000002850294a740 System.Threading.Tasks.SynchronizationContextAwaitTaskContinuation
    -> 000002850294a670 System.Action
    -> 000002850294a650 System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner
    -> 000002850294a700 Namespace.Services.ProjectInfo+<InitEventFlagsAsync>d__90
    -> 000002850294a6b0 System.Threading.Tasks.Task`1[[Namespace.Filters.EventPropertiesFilter, Project.Library.Some]]
    -> 000002850294a9f0 System.Threading.Tasks.SynchronizationContextAwaitTaskContinuation

    00000284fbcc4ec0 (strong handle)
    -> 000002880224dc08 System.Web.RootedObjects
    -> 000002880224d568 System.Web.HttpContext

Found 3 unique roots (run '!GCRoot -all' to see all roots).

They are all stuck in the EventFlags asynchronous method, which uses a cache manager based on SemaphoreSlim . The actual number of System.Threading.SemaphoreSlim+TaskNode objects is 1543.


From the SemaphoreSlim.cs :

// Task in a linked list of asynchronous waiters
private sealed class TaskNode: Task <bool>, IThreadPoolWorkItem 
{
    internal TaskNode Prev, Next;
    internal TaskNode() : base() {}

    [SecurityCritical]
    void IThreadPoolWorkItem.ExecuteWorkItem() 
        {
        bool setSuccessfully = TrySetResult(true);
        Contract.Assert(setSuccessfully, "Should have been able to complete task");
    }

    [SecurityCritical]
    void IThreadPoolWorkItem.MarkAborted(ThreadAbortException tae)
        {
        /* nop */
    }
}

Task in a linked list of asynchronous waiters

It seems that TaskNode is create each time with help CreateAndAddAsyncWaiter method when the semaphore is busy with another thread. Thus, a semblance of a queue is created, which was mentioned above:

/// <summary>
/// Asynchronously waits to enter the <see cref="SemaphoreSlim"/>,
/// using a 32-bit signed integer to measure the time interval, 
/// while observing a <see cref="T:System.Threading.CancellationToken"/>.
/// </summary>
/// <param name="millisecondsTimeout">
/// The number of milliseconds to wait, or <see cref="Timeout.Infinite"/>(-1) to wait indefinitely.
/// </param>
/// <param name="cancellationToken">The <see cref="T:System.Threading.CancellationToken"/> to observe.</param>
/// <returns>
/// A task that will complete with a result of true if the current thread successfully entered 
/// the <see cref="SemaphoreSlim"/>, otherwise with a result of false.
/// </returns>
/// <exception cref="T:System.ObjectDisposedException">The current instance has already been
/// disposed.</exception>
/// <exception cref="ArgumentOutOfRangeException"><paramref name="millisecondsTimeout"/> is a negative number other than -1,
/// which represents an infinite time-out.
/// </exception>
public Task<bool> WaitAsync(int millisecondsTimeout, CancellationToken cancellationToken)
{
    CheckDispose();

    // Validate input
    if (millisecondsTimeout < -1)
    {
        throw new ArgumentOutOfRangeException(
            "totalMilliSeconds", millisecondsTimeout, GetResourceString("SemaphoreSlim_Wait_TimeoutWrong"));
    }

    // Bail early for cancellation
    if (cancellationToken.IsCancellationRequested)
        return Task.FromCancellation<bool>(cancellationToken);

    lock (m_lockObj)
    {
        // If there are counts available, allow this waiter to succeed.
        if (m_currentCount > 0)
        {
            --m_currentCount;
            if (m_waitHandle != null && m_currentCount == 0) m_waitHandle.Reset();
            return s_trueTask;
        }
            // If there aren't, create and return a task to the caller.
            // The task will be completed either when they've successfully acquired
            // the semaphore or when the timeout expired or cancellation was requested.
        else
        {
            Contract.Assert(m_currentCount == 0, "m_currentCount should never be negative");
            var asyncWaiter = CreateAndAddAsyncWaiter();
            return (millisecondsTimeout == Timeout.Infinite && !cancellationToken.CanBeCanceled) ?
                asyncWaiter :
                WaitUntilCountOrTimeoutAsync(asyncWaiter, millisecondsTimeout, cancellationToken);
        }
    }
}

OK, now we need to move to the SemaphoreWrapper object, in the context of which we work with threads. Let's get access to AsynсStateMachine , which is responsible for the transition between states, then move on to the method, which is "executable":

0:000> !do poi(0000028502949e38+10)
Name:        Namespace.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper+<WaitAsync>d__8
MethodTable: 00007ffc28e470a8
EEClass:     00007ffc28e1e9e8
Size:        80(0x50) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\392547e9\cbe69b5a\assembly\dl3\6151fdbb\00ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb9180  4000426       10         System.Int32  1 instance                0 <>1__state
00007ffc28e25f48  4000427       18 ...sable, mscorlib]]  1 instance 0000028502949f00 <>t__builder
00007ffc28e464d8  4000428        8 ...+SemaphoreWrapper  0 instance 00000285fd3a4880 <>4__this
00007ffc85fab338  4000429       30 ...CancellationToken  1 instance 0000028502949f18 cancelToken
00007ffc85fadf70  400042a       38 ...iguredTaskAwaiter  1 instance 0000028502949f20 <>u__1

0:000> !do 00000285fd3a4880
Name:        Namespace.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper
MethodTable: 00007ffc28e464d8
EEClass:     00007ffc28e1e698
Size:        48(0x30) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\392547e9\cbe69b5a\assembly\dl3\6151fdbb\00ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc28e46e68  40003fe        8 ...ClientApiModels]]  0 instance 00000285fd3a4840 _parentRelease
00007ffc85fa6fd0  40003ff       10 ...ing.SemaphoreSlim  0 instance 00000285fd3a48b0 _semaphoreSlim
00007ffc85fb9180  4000400       20         System.Int32  1 instance             1547 _useCount
00007ffc85fb6830  4000401       18        System.String  0 instance 00000285fd3a4800 <Key>k__BackingField

_useCount is a private property of SemaphoreWrapper that indicates how many attempts to access the semaphore have been made. In this case, 1547, which looks bad. Apparently, we are faced with a deadlock - one of the tasks never finishes in view of which the queue is increasing. As a result, we are witnessing an API freeze and an increase in the size of the GC generations.


[Part 2]

The following listings are necessary for dumping objects, what is useful for understanding the status of the cache manager at the time the problem occurred:

0:000> !do poi(000002850294a018+10)
Name:        Namespace.Models.Helpers.MemoryLocalCache+<GetCacheItemAsync>d__9`1[[System.Collections.Generic.HashSet`1[[System.ValueTuple`2[[System.Int64, mscorlib],[System.Int32, mscorlib]], mscorlib]], System.Core]]
MethodTable: 00007ffc291bbf58
EEClass:     00007ffc28e1ad98
Size:        112(0x70) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\392547e9\cbe69b5a\assembly\dl3\6151fdbb\00ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb9180  4000410       30         System.Int32  1 instance                0 <>1__state
00007ffc85f68e40  4000411       38 ...Canon, mscorlib]]  1 instance 000002850294a100 <>t__builder
00007ffc85fb6830  4000412        8        System.String  0 instance 0000028502949c98 key
00007ffc288c8510  4000413       10 ....MemoryLocalCache  0 instance 000002880225d460 <>4__this
00007ffc85fb6830  4000414       18        System.String  0 instance 00000286fc9bbb30 methodName
00007ffc85fab338  4000415       50 ...CancellationToken  1 instance 000002850294a118 ct
0000000000000000  4000416       20                       0 instance 0000028502949bb8 updateMethod
00007ffc85fb9180  4000417       34         System.Int32  1 instance              600 cacheTime
00007ffc858ab3e8  4000418       28   System.IDisposable  0 instance 0000000000000000 <>7__wrap1
00007ffc28e24bc8  4000419       58 ...sable, mscorlib]]  1 instance 000002850294a120 <>u__1
00007ffc85f6d2a0  400041a       60 ...Canon, mscorlib]]  1 instance 000002850294a128 <>u__2

0:000> !do 000002880225d460
Name:        Namespace.Models.Helpers.MemoryLocalCache
MethodTable: 00007ffc288c8510
EEClass:     00007ffc288aa6c8
Size:        32(0x20) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\392547e9\cbe69b5a\assembly\dl3\6151fdbb\00ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb6830  40002c7        8        System.String  0 instance 0000000000000000 _dbPrefix
00007ffc8fabf6e8  40002c8       10 ...ching.MemoryCache  0 instance 00000287fc909e10 _cache
00007ffc28e45998  40002c9       20 ...eyedSemaphoreSlim  0   static 00000284fc911968 _lock

0:000> !do 00000284fc911968
Name:        Namespace.Models.Helpers.KeyedSemaphoreSlim
MethodTable: 00007ffc28e45998
EEClass:     00007ffc28e1d758
Size:        40(0x28) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\392547e9\cbe69b5a\assembly\dl3\6151fdbb\00ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb6e10  40002c4        8        System.Object  0 instance 00000284fc911990 _lock
00007ffc28e46580  40002c5       10 ...ClientApiModels]]  0 instance 00000284fc9119a8 _wrapperMap
00007ffc85fa0b50  40002c6       18       System.Boolean  1 instance                0 _isDisposed

0:000> !do 00000284fc9119a8
Name:        System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[Namespace.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper, Project.Library.SomeModels]]
MethodTable: 00007ffc28e46580
EEClass:     00007ffc85964200
Size:        80(0x50) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb9118  400186d        8       System.Int32[]  0 instance 00000287ff97e148 buckets
00007ffc86a89ae0  400186e       10 ...non, mscorlib]][]  0 instance 00000287ff97efc0 entries
00007ffc85fb9180  400186f       38         System.Int32  1 instance              765 count
00007ffc85fb9180  4001870       3c         System.Int32  1 instance            39027 version
00007ffc85fb9180  4001871       40         System.Int32  1 instance              763 freeList
00007ffc85fb9180  4001872       44         System.Int32  1 instance                2 freeCount
00007ffc85f77d10  4001873       18 ...Canon, mscorlib]]  0 instance 00000285fc8c9050 comparer
00007ffc85f75060  4001874       20 ...Canon, mscorlib]]  0 instance 0000000000000000 keys
00007ffc85f86870  4001875       28 ...Canon, mscorlib]]  0 instance 0000000000000000 values
00007ffc85fb6e10  4001876       30        System.Object  0 instance 0000000000000000 _syncRoot

0:000> .foreach /pS 16 /ps 1 (addr {!da 00000287ff97efc0}) { .if ($sicmp("${addr}", "null") != 0) { !do poi(${addr}+8) } }
<Note: this object has an invalid CLASS field>
Invalid object
Name:        Namespace.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper
MethodTable: 00007ffc28e464d8
EEClass:     00007ffc28e1e698
Size:        48(0x30) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\392547e9\cbe69b5a\assembly\dl3\6151fdbb\00ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc28e46e68  40003fe        8 ...ClientApiModels]]  0 instance 00000285fd3a4840 _parentRelease
00007ffc85fa6fd0  40003ff       10 ...ing.SemaphoreSlim  0 instance 00000285fd3a48b0 _semaphoreSlim
00007ffc85fb9180  4000400       20         System.Int32  1 instance             1547 _useCount
00007ffc85fb6830  4000401       18        System.String  0 instance 00000285fd3a4800 <Key>k__BackingField

Name:        Namespace.Models.Helpers.KeyedSemaphoreSlim+SemaphoreWrapper
MethodTable: 00007ffc28e464d8
EEClass:     00007ffc28e1e698
Size:        48(0x30) bytes
File:        C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\392547e9\cbe69b5a\assembly\dl3\6151fdbb\00ebcf32_4ebad501\Project.Library.SomeModels.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc28e46e68  40003fe        8 ...ClientApiModels]]  0 instance 00000285fd3cbd18 _parentRelease
00007ffc85fa6fd0  40003ff       10 ...ing.SemaphoreSlim  0 instance 00000285fd3cbd88 _semaphoreSlim
00007ffc85fb9180  4000400       20         System.Int32  1 instance              846 _useCount
00007ffc85fb6830  4000401       18        System.String  0 instance 00000285fd3cbcb0 <Key>k__BackingField

After analyzing the private field _wrapperMap of the KeyedSemaphoreSlim object, we can understand what keys the problem occurred with, as well as within which endpoint. In this case we have two keys with abnormally large field values of _useCount field.


1547

0:000> !do 00000285fd3a4800
Name:        System.String
MethodTable: 00007ffc85fb6830
EEClass:     00007ffc85896cb8
Size:        62(0x3e) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String:      _EventFlagsAsync_7
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb9180  4000273        8         System.Int32  1 instance               18 m_stringLength
00007ffc85fb79e8  4000274        c          System.Char  1 instance               5f m_firstChar
00007ffc85fb6830  4000278       a0        System.String  0   shared           static Empty
                                 >> Domain:Value  00000284fc4e9070:NotInit  00000284fc600120:NotInit  <

846

0:000> !do 00000285fd3cbcb0
Name:        System.String
MethodTable: 00007ffc85fb6830
EEClass:     00007ffc85896cb8
Size:        102(0x66) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String:      _GetLiveMenuCacheAsync_13
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffc85fb9180  4000273        8         System.Int32  1 instance               38 m_stringLength
00007ffc85fb79e8  4000274        c          System.Char  1 instance               5f m_firstChar
00007ffc85fb6830  4000278       a0        System.String  0   shared           static Empty
                                 >> Domain:Value  00000284fc4e9070:NotInit  00000284fc600120:NotInit  <<

We can conclude that the problem occurs in the context of the EventFlagsAsync method, which is called as part of GetLiveMenuCacheAsync - an endpoint with almost the same name. Let's see what EventFlagsAsync is and how it is called:

internal async Task<EventPropertiesFilter> InitEventFlagsAsync()
{
    return await _eventProperties.SetEventFlags(EventFlagsAsync(_days));
}

public async Task<HashSet<ValueTuple<long, int>>> EventFlagsAsync(int days)
{
    return await _cache.GetCacheItemAsync<HashSet<ValueTuple<long, int>>>(days.ToString(),
        async () => await GetEventFlagsAsync(days), (int)MemoryLocalCache.CacheTimeSec.Min10, CancellationToken.None);
}

private async Task<HashSet<ValueTuple<long, int>>> GetEventFlagsAsync(int days)
{
    HashSet<ValueTuple<long, int>> ecs = new HashSet<ValueTuple<long, int>>();
    foreach (var ef in await _dbConnection.QueryAsync<ValueTuple<long, int>>(Query_EventFlags, 
        new { StartDate = DateTime.UtcNow.AddHours(-6), EndDate = DateTime.UtcNow.AddDays(days) })) 
    {
        ecs.Add(ef);
    }
    
    return ecs;
}

public async Task<EventPropertiesFilter> SetEventFlags(Task<HashSet<ValueTuple<long, int>>> eventFlags)
{
    _eventFlags = _eventFlags ?? await eventFlags;
    return this;
}

As a result, we come to the source code line in which incorrect work with the Task is performed, which led us to this story:

_eventFlags = _eventFlags ?? await eventFlags;

UPD

There is an extension that makes working with asynchronous methods easier - !dumpasync .

The !dumpasync command is a WinDbg extension that produces a list of async method callstacks that may not appear on thread callstacks. This information is useful when diagnosing the cause for an app hang that is waiting for async methods to complete.

Unfortunately at that time it did not work with my dump, so I had to act without it, although the problem may not be relevant (I didn't have time to touch on this topic more closely, although it's worth it).

Since it's a post release thing, I'd suggest the following.

Check changelog

From your version control, find the changed calls to limit the investigation area and find where the new problem stems from.

Check long running requests

Check this answer here for tracking long running requests . It is most probable that you have some requests waiting for a great amount of time (or there are multiple of these and don't work).

Check for changes in the consumers

Especially if you are using HttpClient . check this one here . You might even be exhausting ports.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM