[英]async HttpClient requests slowing down
I have list of 10 000 000 urls in text file. 我在文本文件中有1 000万个URL列表。 Now I open every of them in my await/async method - at the beging the speed is very good (near 10 000 urls / min) but while the program is running it's decreasing to reach 500 urls / min after ~10 hours.
现在,我以await / async方法打开它们中的每一个-在开始时,速度非常好(接近10000 urls / min),但是在程序运行时,它在约10小时后逐渐降低到500 urls / min。 When I restart the program and run from begging the situation is the same - fast at beggining and then slower and slower.
当我重新启动程序并从乞讨开始运行时,情况是一样的-乞讨开始很快,然后越来越慢。 I'm working on Windows Server 2008 R2.
我正在使用Windows Server 2008 R2。 Tested my code at various PC - some results.
在各种PC上测试了我的代码-一些结果。 Can You tell me where is the problem?
你能告诉我问题出在哪里吗?
int finishedUrls = 0;
IEnumerable<string> urls = File.ReadLines("urlslist.txt");
await urls.ForEachAsync(500, async url =>
{
Uri newUri;
if (!Uri.TryCreate(siteUrl, UriKind.Absolute, out newUri)) return false;
_uri = newUri;
var timeout = new CancellationTokenSource(TimeSpan.FromSeconds(30));
string html = "";
using(var _httpClient = new HttpClient { Timeout = TimeSpan.FromSeconds(30),MaxResponseContentBufferSize = 300000 }) {
using(var _req = new HttpRequestMessage(HttpMethod.Get, _uri)){
using( var _response = await _httpClient.SendAsync(_req,HttpCompletionOption.ResponseContentRead,timeout.Token).ConfigureAwait(false)) {
if (_response != null &&
(_response.StatusCode == HttpStatusCode.OK || _response.StatusCode == HttpStatusCode.NotFound))
{
using (var cancel = timeout.Token.Register(_response.Dispose))
{
var rawResponse = await _response.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
html = Encoding.UTF8.GetString(rawResponse);
}
}
}
}
}
Interlocked.Increment(ref finishedUrls);
});
http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx
I believe you are exhausting your io completion ports. 我相信您正在用尽io完成端口。 You need to throttle your requests.
您需要限制您的请求。 If you need higher concurrency than a single box can handle, then distribute your concurrent requests across more machines.
如果您需要更高的并发性,而不是单个盒子可以处理的并发性,那么可以将并发请求分布在更多计算机上。 I'd suggest using TPL more managing the conncurrency.
我建议使用TPL来更多地管理并发。 I ran into this exact same behavior doing similar things.
我在做类似事情时遇到了完全相同的行为。 Also, you should absolutely not be disposing your HttpClient per request.
另外,绝对不应按请求处置HttpClient。 Pull that code out and use a single client.
拉出该代码并使用一个客户端。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.