简体   繁体   English

Parallel.ForEach超过线程数限制

[英]Parallel.ForEach exceed threads limit

I'm trying to do a stable multi threading system (Use exact number of threads set) 我正在尝试做一个稳定的多线程系统(使用确切数量的线程集)

Here's the code I'm actually using : 这是我实际使用的代码:

public void Start()
{

    List<String> list = new List<String>(File.ReadAllLines("urls.txt"));

    int maxThreads = 100;
    var framework = new Sender();

    ThreadPool.SetMinThreads(maxThreads, maxThreads);

    Parallel.ForEach(list, new ParallelOptions { MaxDegreeOfParallelism = maxThreads }, delegate (string url)
    {

        framework.Send(url, "proxy:port");

    });

    Console.WriteLine("Done.");

}

It is fast and working, but it exceed 100 threads limit, wouldn't be a problem if the proxies I'm using where locked to 100 simultaneous connections, so a lot of requests get cancelled by my proxy provider, any idea of how I can keep that threads speed without exceeding limit? 它既快速又有效,但是超过了100个线程限制,如果我使用的代理服务器锁定了100个并发连接,这将不是问题,因此我的代理提供程序会取消很多请求,无论我如何理解可以保持线程速度不超过限制?

Thanks. 谢谢。

Your Framwork.Send method is returning immediately and processing asynchronously. 您的Framwork.Send方法立即返回并异步处理。 To validate this, I created the following test method, which works as expected: 为了验证这一点,我创建了以下测试方法,该方法可以按预期工作:

public static void Main()
{
    List<String> list = new List<String>(Enumerable.Range(0,10000).Select(i=>i.ToString()));

    int maxThreads = 100;

    ThreadPool.SetMinThreads(maxThreads, maxThreads);

    int currentCount = 0;
    int maxCount = 0;
    object locker = new object();
    Parallel.ForEach(list, new ParallelOptions { MaxDegreeOfParallelism = maxThreads }, delegate (string url)
    {
        lock (locker)
        {
            currentCount++;
            maxCount = Math.Max(currentCount, maxCount);
        }
        Thread.Sleep(10);
        lock (locker)
        {
            maxCount = Math.Max(currentCount, maxCount);
            currentCount--;
        }
    });

    Console.WriteLine("Max Threads: " + maxCount); //Max Threads: 100
    Console.Read();
}

Parallel.For/Foreach are meant for data parallelism - processing a large number of data that doesn't need to perform IO. Parallel.For/Foreach用于数据并行性 -处理不需要执行IO的大量数据。 In this case there's no reason to use more threads than cores that can run them. 在这种情况下,没有理由使用比可以运行它们的核心更多的线程。

This question though is about network IO, concurrent connections and throttling . 但是,这个问题与网络IO,并发连接和限制有关 If the proxy provider has a limit, MaxDegreeOfParallelism must be set to a value low enough that the limit isn't exceeded. 如果代理提供者有限制, MaxDegreeOfParallelism必须将MaxDegreeOfParallelism设置为足够低的值,以确保不超出限制。

A better solution would be to use an ActionBlock with limited MaxDegreeOfParallelism and a limit to its input buffer so it doesn't get flooded with urls that await processing. 更好的解决方案是使用具有受限的MaxDegreeOfParallelism并限制其输入缓冲区的ActionBlock ,这样它就不会被等待处理的URL淹没。

static async Task Main()
{
    var maxConnections=20;
    var options=new ExecutionDataflowBlockOptions 
                {
                    MaxDegreeOfParallelism = maxConnections,
                    BoundedCapacity        = maxConnections * 2
                };
    var framework = new Sender();
    var myBlock=new ActionBlock<string>(url=>
                {
                    framework.Send(...);
                }, options);

    //ReadLines doesn't load everything, it returns an IEnumerable<string> that loads
    //lines as needed
    var lines = File.ReadLines("urls.txt");

    foreach(var url in lines)
    {
        //Send each line to the block, waiting if the buffer is full
        await myBlock.SendAsync(url);
    }
    //Tell the block we are done
    myBlock.Complete();
    //And wait until it finishes everything
    await myBlock.Completion;
}

Setting the bounded capacity and MaxDegreeOfParallelism helps with concurrency limits, but not with request/sec limits. 设置限制容量和MaxDegreeOfParallelism有助于并发限制,但不适用于请求/秒限制。 To limit that, one could add a small delay after each request. 为了限制这一点,可以在每个请求之后添加一小段延迟。 The block's code would have to change to eg : 该块的代码必须更改为例如:

    var delay=250; // Milliseconds, 4 reqs/sec per connection
    var myBlock=new ActionBlock<string>( async url=>
                {
                    framework.Send(...);
                    await Task.Delay(delay);
                }, options);

This can be improved further if Sender.Send became an asynchronous method. 如果Sender.Send成为异步方法,则可以进一步改善。 It could use for example HttpClient which only provides asynchronous methods, so it doesn't block waiting for a response. 例如,它可以使用仅提供异步方法的HttpClient,因此它不会阻止等待响应。 The changes would be minimal : 更改将是最小的:

    var myBlock=new ActionBlock<string>( async url=>
                {
                    await framework.SendAsync(...);
                    await Task.Delay(delay);
                }, options);

But the program would use less threads and less CPU - each call to await ... releases the current thread until a response is received. 但是该程序将使用更少的线程和更少的CPU-每次await ...释放当前线程,直到收到响应为止。

Blocking a thread on the other hand stands with a spinwait which means it wastes CPU cycles waiting for a response before putting the thread to sleep. 另一方面,阻塞线程的状态为spinwait,这意味着浪费线程等待睡眠之前等待响应的CPU周期。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM