[英]Parallel.ForEach exceed threads limit
I'm trying to do a stable multi threading system (Use exact number of threads set) 我正在尝试做一个稳定的多线程系统(使用确切数量的线程集)
Here's the code I'm actually using : 这是我实际使用的代码:
public void Start()
{
List<String> list = new List<String>(File.ReadAllLines("urls.txt"));
int maxThreads = 100;
var framework = new Sender();
ThreadPool.SetMinThreads(maxThreads, maxThreads);
Parallel.ForEach(list, new ParallelOptions { MaxDegreeOfParallelism = maxThreads }, delegate (string url)
{
framework.Send(url, "proxy:port");
});
Console.WriteLine("Done.");
}
It is fast and working, but it exceed 100 threads limit, wouldn't be a problem if the proxies I'm using where locked to 100 simultaneous connections, so a lot of requests get cancelled by my proxy provider, any idea of how I can keep that threads speed without exceeding limit? 它既快速又有效,但是超过了100个线程限制,如果我使用的代理服务器锁定了100个并发连接,这将不是问题,因此我的代理提供程序会取消很多请求,无论我如何理解可以保持线程速度不超过限制?
Thanks. 谢谢。
Your Framwork.Send method is returning immediately and processing asynchronously. 您的Framwork.Send方法立即返回并异步处理。 To validate this, I created the following test method, which works as expected:
为了验证这一点,我创建了以下测试方法,该方法可以按预期工作:
public static void Main()
{
List<String> list = new List<String>(Enumerable.Range(0,10000).Select(i=>i.ToString()));
int maxThreads = 100;
ThreadPool.SetMinThreads(maxThreads, maxThreads);
int currentCount = 0;
int maxCount = 0;
object locker = new object();
Parallel.ForEach(list, new ParallelOptions { MaxDegreeOfParallelism = maxThreads }, delegate (string url)
{
lock (locker)
{
currentCount++;
maxCount = Math.Max(currentCount, maxCount);
}
Thread.Sleep(10);
lock (locker)
{
maxCount = Math.Max(currentCount, maxCount);
currentCount--;
}
});
Console.WriteLine("Max Threads: " + maxCount); //Max Threads: 100
Console.Read();
}
Parallel.For/Foreach
are meant for data parallelism - processing a large number of data that doesn't need to perform IO. Parallel.For/Foreach
用于数据并行性 -处理不需要执行IO的大量数据。 In this case there's no reason to use more threads than cores that can run them. 在这种情况下,没有理由使用比可以运行它们的核心更多的线程。
This question though is about network IO, concurrent connections and throttling . 但是,这个问题与网络IO,并发连接和限制有关 。 If the proxy provider has a limit,
MaxDegreeOfParallelism
must be set to a value low enough that the limit isn't exceeded. 如果代理提供者有限制,
MaxDegreeOfParallelism
必须将MaxDegreeOfParallelism
设置为足够低的值,以确保不超出限制。
A better solution would be to use an ActionBlock with limited MaxDegreeOfParallelism and a limit to its input buffer so it doesn't get flooded with urls that await processing. 更好的解决方案是使用具有受限的MaxDegreeOfParallelism并限制其输入缓冲区的ActionBlock ,这样它就不会被等待处理的URL淹没。
static async Task Main()
{
var maxConnections=20;
var options=new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxConnections,
BoundedCapacity = maxConnections * 2
};
var framework = new Sender();
var myBlock=new ActionBlock<string>(url=>
{
framework.Send(...);
}, options);
//ReadLines doesn't load everything, it returns an IEnumerable<string> that loads
//lines as needed
var lines = File.ReadLines("urls.txt");
foreach(var url in lines)
{
//Send each line to the block, waiting if the buffer is full
await myBlock.SendAsync(url);
}
//Tell the block we are done
myBlock.Complete();
//And wait until it finishes everything
await myBlock.Completion;
}
Setting the bounded capacity and MaxDegreeOfParallelism helps with concurrency limits, but not with request/sec limits. 设置限制容量和MaxDegreeOfParallelism有助于并发限制,但不适用于请求/秒限制。 To limit that, one could add a small delay after each request.
为了限制这一点,可以在每个请求之后添加一小段延迟。 The block's code would have to change to eg :
该块的代码必须更改为例如:
var delay=250; // Milliseconds, 4 reqs/sec per connection
var myBlock=new ActionBlock<string>( async url=>
{
framework.Send(...);
await Task.Delay(delay);
}, options);
This can be improved further if Sender.Send
became an asynchronous method. 如果
Sender.Send
成为异步方法,则可以进一步改善。 It could use for example HttpClient which only provides asynchronous methods, so it doesn't block waiting for a response. 例如,它可以使用仅提供异步方法的HttpClient,因此它不会阻止等待响应。 The changes would be minimal :
更改将是最小的:
var myBlock=new ActionBlock<string>( async url=>
{
await framework.SendAsync(...);
await Task.Delay(delay);
}, options);
But the program would use less threads and less CPU - each call to await ...
releases the current thread until a response is received. 但是该程序将使用更少的线程和更少的CPU-每次
await ...
释放当前线程,直到收到响应为止。
Blocking a thread on the other hand stands with a spinwait which means it wastes CPU cycles waiting for a response before putting the thread to sleep. 另一方面,阻塞线程的状态为spinwait,这意味着浪费线程等待睡眠之前等待响应的CPU周期。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.