简体   繁体   English

HttpClient 异步请求失败

[英]HttpClient async requests failing

I need to fetch content from some 3000 urls.我需要从大约 3000 个网址中获取内容。 I'm using HttpClient , create Task for each url, add tasks to list and then await Task.WhenAll .我正在使用HttpClient ,为每个 url 创建Task ,将任务添加到列表中,然后await Task.WhenAll Something like this像这样的东西

    var tasks = new List<Task<string>>();
    foreach (var url in urls) {
        var task = Task.Run(() => httpClient.GetStringAsync(url));
        tasks.Add(task);
    }

    var t = Task.WhenAll(tasks);

However many tasks end up in Faulted or Canceled states.然而,许多任务最终处于FaultedCanceled状态。 I thought it might be problem with the concrete urls, but no.我认为具体的网址可能有问题,但没有。 I can fetch those url no problem with curl in parallel.我可以用 curl 并行获取那些 url 没问题。

I tried HttpClientHandler , WinHttpHandler with various timeouts etc. Always several hundred urls end with an error.我尝试了各种超时的HttpClientHandlerWinHttpHandler等。总是有数百个 url 以错误结尾。 Then I tried to fetch those urls in batches of 10 and that works.然后我尝试以 10 个为一组获取这些 url,这很有效。 No errors, but very slow.没有错误,但很慢。 Curl will fetch 3000 urls in parallel very fast. Curl 将非常快速地并行获取 3000 个 url。 Then I tried to get httpbin.org 3000 times to verify that the issue is not with my particular urls:然后我尝试获取httpbin.org 3000 次以验证问题不在于我的特定网址:

    var handler = new HttpClientHandler() { MaxConnectionsPerServer = 5000 };
    var httpClient = new HttpClient(handler);

    var tasks = new List<Task<HttpResponseMessage>>();
    foreach (var _ in Enumerable.Range(1, 3000)) {
        var task = Task.Run(() => httpClient.GetAsync("http://httpbin.org"));
        tasks.Add(task);
    }

    var t = Task.WhenAll(tasks);
    try { await t.ConfigureAwait(false); } catch { }

    int ok = 0, faulted = 0, cancelled = 0;

    foreach (var task in tasks) {
        switch (task.Status) {
            case TaskStatus.RanToCompletion: ok++; break;
            case TaskStatus.Faulted: faulted++; break;
            case TaskStatus.Canceled: cancelled++; break;

        }
    }

    Console.WriteLine($"RanToCompletion: {ok} Faulted: {faulted} Canceled: {cancelled}");

Again, always several hundred Tasks end in error.同样,总是有数百个任务以错误结束。

So, what is the issue here?那么,这里的问题是什么? Why I cannot get those urls with async ?为什么我无法使用async获取这些网址?

I'm using .NET Core and therefore the suggestion to use ServicePointManager ( Trying to run multiple HTTP requests in parallel, but being limited by Windows (registry) ) is not applicable.我正在使用 .NET Core,因此使用 ServicePointManager 的建议( 尝试并行运行多个 HTTP 请求,但受 Windows(注册表)的限制)不适用。

Also, the urls I need to fetch point to different hosts.此外,我需要获取的 url 指向不同的主机。 The code with httpbin is just a test, to show that the problem was not with my urls being invalid. httpbin 的代码只是一个测试,表明问题不在于我的网址无效。

As Fildor said in the comments, httpClient.GetStringAsync returns Task .正如 Fildor 在评论中所说, httpClient.GetStringAsync返回Task So you don't need to wrap it in Task.Run .因此,您无需将其包装在Task.Run中。

I ran this code in the console app.我在控制台应用程序中运行了这段代码。 It took 50 seconds to complete.完成需要 50 秒。 In your comment, you wrote that curl performs 3000 queries in less than a minute - the same thing.在您的评论中,您写道 curl 在不到一分钟的时间内执行了 3000 个查询——同样的事情。

var httpClient = new HttpClient();
var tasks = new List<Task<string>>();
var sw = Stopwatch.StartNew();

for (int i = 0; i < 3000; i++)
{
    var task = httpClient.GetStringAsync("http://httpbin.org");
    tasks.Add(task);
}

Task.WaitAll(tasks.ToArray());
sw.Stop();

Console.WriteLine(sw.Elapsed);
Console.WriteLine(tasks.All(t => t.IsCompleted));

Also, all requests were completed successfully.此外,所有请求均已成功完成。

In your code, you are waiting for tasks started using Task.Run .在您的代码中,您正在等待使用Task.Run开始的任务。 But you need to wait for the completion of tasks started by calling httpClient.Get...但是您需要等待通过调用httpClient.Get...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM