简体   繁体   中英

HttpClient concurrent behavior different when running in Powershell than in Visual Studio

I'm migrating millions of users from on-prem AD to Azure AD B2C using MS Graph API to create the users in B2C. I've written a .Net Core 3.1 console application to perform this migration. To speed things along I'm making concurrent calls to the Graph API. This is working great - sort of.

During development I experienced acceptable performance while running from Visual Studio 2019, but for test I'm running from the command line in Powershell 7. From Powershell the performance of concurrent calls to the HttpClient is very bad. It appears that there's a limit to the number of concurrent calls that HttpClient is allowing when running from Powershell, so calls in concurrent batches greater than 40 to 50 requests start to stack up. It seems to be running 40 to 50 concurrent requests while blocking the rest.

I'm not looking for assistance with async programming. I'm looking for a way to trouble shoot the difference between Visual Studio run-time behavior and Powershell command line run-time behavior. Running in release mode from Visual Studio's green arrow button behaves as expected. Running from the command line does not.

I fill a task list with async calls and then await Task.WhenAll(tasks). Each call takes between 300 and 400 milliseconds. When running from Visual Studio it works as expected. I make concurrent batches of 1000 calls and each individually completes within the expected time. The whole task block takes just a few milliseconds longer than the longest individual call.

The behavior changes when I run the same build from the Powershell command line. The first 40 to 50 calls take the expected 300 to 400 milliseconds but then the individual call times grow up to 20 seconds each. I think the calls are serializing, so only 40 to 50 are being executed at a time while the others wait.

After hours of trial and error I was able to narrow it down to the HttpClient. To isolate the problem I mocked the calls to HttpClient.SendAsync with a method that does Task.Delay(300) and returns a mock result. In this case running from the console behaves identically to running from Visual Studio.

I'm using IHttpClientFactory and I've even tried adjusting the connection limit on ServicePointManager.

Here's my registration code.

    public static IServiceCollection RegisterHttpClient(this IServiceCollection services, int batchSize)
    {
        ServicePointManager.DefaultConnectionLimit = batchSize;
        ServicePointManager.MaxServicePoints = batchSize;
        ServicePointManager.SetTcpKeepAlive(true, 1000, 5000);

        services.AddHttpClient(MSGraphRequestManager.HttpClientName, c =>
        {
            c.Timeout = TimeSpan.FromSeconds(360);
            c.DefaultRequestHeaders.Add("User-Agent", "xxxxxxxxxxxx");
        })
        .ConfigurePrimaryHttpMessageHandler(() => new DefaultHttpClientHandler(batchSize));

        return services;
    }

Here's the DefaultHttpClientHandler.

internal class DefaultHttpClientHandler : HttpClientHandler
{
    public DefaultHttpClientHandler(int maxConnections)
    {
        this.MaxConnectionsPerServer = maxConnections;
        this.UseProxy = false;
        this.AutomaticDecompression = System.Net.DecompressionMethods.GZip | System.Net.DecompressionMethods.Deflate;
    }
}

Here's the code that sets up the tasks.

        var timer = Stopwatch.StartNew();
        var tasks = new Task<(UpsertUserResult, TimeSpan)>[users.Length];
        for (var i = 0; i < users.Length; ++i)
        {
            tasks[i] = this.CreateUserAsync(users[i]);
        }

        var results = await Task.WhenAll(tasks);
        timer.Stop();

Here's how I mocked out the HttpClient.

        var httpClient = this.httpClientFactory.CreateClient(HttpClientName);
        #if use_http
            using var response = await httpClient.SendAsync(request);
        #else
            await Task.Delay(300);
            var graphUser = new User { Id = "mockid" };
            using var response = new HttpResponseMessage(HttpStatusCode.OK) { Content = new StringContent(JsonConvert.SerializeObject(graphUser)) };
        #endif
        var responseContent = await response.Content.ReadAsStringAsync();

Here are metrics for 10k B2C users created via GraphAPI using 500 concurrent requests. The first 500 requests are longer than normal because the TCP connections are being created.

Here's a link to the console run metrics .

Here's a link to the Visual Studio run metrics .

The block times in the VS run metrics are different than what I said in this post because I moved all the synchronous file access to the end of the process in an effort to isolate the problematic code as much as possible for the test runs.

The project is compiled using .Net Core 3.1. I'm using Visual Studio 2019 16.4.5.

Two things come to mind. Most microsoft powershell was written in version 1 and 2. Version 1 and 2 have System.Threading.Thread.ApartmentState of MTA. In version 3 through 5 the apartment state changed to STA by default.

The second thought is it sounds like they are using System.Threading.ThreadPool to manage the threads. How big is your threadpool?

If those do not solve the issue start digging under System.Threading.

When I read your question I thought of this blog. https://devblogs.microsoft.com/oldnewthing/20170623-00/?p=96455

A colleague demonstrated with a sample program that creates a thousand work items, each of which simulates a network call that takes 500ms to complete. In the first demonstration, the network calls were blocking synchronous calls, and the sample program limited the thread pool to ten threads in order to make the effect more apparent. Under this configuration, the first few work items were quickly dispatched to threads, but then the latency started to build as there were no more threads available to service new work items, so the remaining work items had to wait longer and longer for a thread to become available to service it. The average latency to the start of the work item was over two minutes.

Update 1: I ran PowerShell 7.0 from the start menu and the thread state was STA. Is the thread state different in the two versions?

PS C:\Program Files\PowerShell\7>  [System.Threading.Thread]::CurrentThread

ManagedThreadId    : 12
IsAlive            : True
IsBackground       : False
IsThreadPoolThread : False
Priority           : Normal
ThreadState        : Running
CurrentCulture     : en-US
CurrentUICulture   : en-US
ExecutionContext   : System.Threading.ExecutionContext
Name               : Pipeline Execution Thread
ApartmentState     : STA

Update 2: I wish better answer but, you will have compare the two environments till something stands out.

PS C:\Windows\system32> [System.Net.ServicePointManager].GetProperties() | select name

Name                               
----                               
SecurityProtocol                   
MaxServicePoints                   
DefaultConnectionLimit             
MaxServicePointIdleTime            
UseNagleAlgorithm                  
Expect100Continue                  
EnableDnsRoundRobin                
DnsRefreshTimeout                  
CertificatePolicy                  
ServerCertificateValidationCallback
ReusePort                          
CheckCertificateRevocationList     
EncryptionPolicy            

Update 3:

https://docs.microsoft.com/en-us/uwp/api/windows.web.http.httpclient

In addition, every HttpClient instance uses its own connection pool, isolating its requests from requests executed by other HttpClient instances.

If an app using HttpClient and related classes in the Windows.Web.Http namespace downloads large amounts of data (50 megabytes or more), then the app should stream those downloads and not use the default buffering. If the default buffering is used the client memory usage will get very large, potentially resulting in reduced performance.

Just keep comparing the two environments and the issue should stand out

Add-Type -AssemblyName System.Net.Http
$client = New-Object -TypeName System.Net.Http.Httpclient
$client | format-list *

DefaultRequestHeaders        : {}
BaseAddress                  : 
Timeout                      : 00:01:40
MaxResponseContentBufferSize : 2147483647

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM