简体   繁体   English

C#平行式 线程代码性能

[英]C# Parallel Vs. Threaded code performance

I've been testing the performance of System.Threading.Parallel vs a Threading and I'm surprised to see Parallel taking longer to finish tasks than threading. 我一直在测试System.Threading.Parallel与Threading的性能,但我很惊讶地看到Parallel完成任务所花的时间比线程花费的时间更长。 I'm sure it's due to my limited knowledge of Parallel, which I just started reading up on. 我敢肯定这是由于我对Parallel的了解有限,我才刚刚开始阅读。

I thought i'll share few snippets and if anyone can point out to me paralle code is running slower vs threaded code. 我以为我会分享一些摘要,如果有人可以指出,并行代码与线程代码的运行速度较慢。 Also tried to run the same comparison for finding prime numbers and found parallel code finishing much later than threaded code. 还尝试进行相同的比较以查找质数,并发现并行代码完成比线程代码完成晚得多。

public class ThreadFactory
{
    int workersCount;
    private List<Thread> threads = new List<Thread>();

    public ThreadFactory(int threadCount, int workCount, Action<int, int, string> action)
    {
        workersCount = threadCount;

        int totalWorkLoad = workCount;
        int workLoad = totalWorkLoad / workersCount;
        int extraLoad = totalWorkLoad % workersCount;

        for (int i = 0; i < workersCount; i++)
        {
            int min, max;
            if (i < (workersCount - 1))
            {
                min = (i * workLoad);
                max = ((i * workLoad) + workLoad - 1);
            }
            else
            {
                min = (i * workLoad);
                max = (i * workLoad) + (workLoad - 1 + extraLoad);
            }
            string name = "Working Thread#" + i; 

            Thread worker = new Thread(() => { action(min, max, name); });
            worker.Name = name;
            threads.Add(worker);
        }
    }

    public void StartWorking()
    {
        foreach (Thread thread in threads)
        {
            thread.Start();
        }

        foreach (Thread thread in threads)
        {
            thread.Join();
        }
    }
}

Here is the program: 这是程序:

Stopwatch watch = new Stopwatch();
watch.Start();
int path = 1;

List<int> numbers = new List<int>(Enumerable.Range(0, 10000));

if (path == 1)
{
    Parallel.ForEach(numbers, x =>
    {
        Console.WriteLine(x);
        Thread.Sleep(1);

    });
}
else
{
    ThreadFactory workers = new ThreadFactory(10, numbers.Count, (min, max, text) => {

        for (int i = min; i <= max; i++)
        {
            Console.WriteLine(numbers[i]);
            Thread.Sleep(1);
        }
    });

    workers.StartWorking();
}

watch.Stop();
Console.WriteLine(watch.Elapsed.TotalSeconds.ToString());

Console.ReadLine();

Update: 更新:

Taking Locking into consideration: I tried the following snippet. 考虑到锁定:我尝试了以下代码段。 Again the same results, Parallel seems to finish much slower. 同样的结果,Parallel似乎慢得多。

path = 1; 路径= 1; cieling = 10000000; 冰晶= 10000000;

    List<int> numbers = new List<int>();

    if (path == 1)
    {
        Parallel.For(0, cieling, x =>
        {
            lock (numbers)
            {
                numbers.Add(x);    
            }

        });
    }

    else
    {
        ThreadFactory workers = new ThreadFactory(10, cieling, (min, max, text) =>
        {

            for (int i = min; i <= max; i++)
            {
                lock (numbers)
                {
                    numbers.Add(i);    
                }                       

            }
        });

        workers.StartWorking();
    }

Update 2: Just a quick update that my machine has Quad Core Processor. 更新2:只需快速更新一下,我的机器就有四核处理器。 So Parallel have 4 cores available. 因此,Parallel有4个内核可用。

Refering to a blog post by Reed Copsey Jr: 参考里德·科普西(Reed Copsey Jr)的博客文章

Parallel.ForEach is a bit more complicated, however. 但是,Parallel.ForEach有点复杂。 When working with a generic IEnumerable, the number of items required for processing is not known in advance, and must be discovered at runtime. 使用通用IEnumerable时,处理所需的项目数事先未知,必须在运行时发现。 In addition, since we don't have direct access to each element, the scheduler must enumerate the collection to process it. 另外,由于我们没有直接访问每个元素的权限,因此调度程序必须枚举集合以对其进行处理。 Since IEnumerable is not thread safe, it must lock on elements as it enumerates, create temporary collections for each chunk to process, and schedule this out . 由于IEnumerable并非线程安全的,因此它必须在枚举时锁定元素,为要处理的每个块创建临时集合,然后对其进行调度

The locking and copying could make Parallel.ForEach take longer. 锁定和复制可能会使Parallel.ForEach花费更长的时间。 Also partitioning and the scheduler of ForEach could impact and give overhead. ForEach的分区和调度程序也会影响并产生开销。 I tested your code and increased the sleep of each task, and then the results are closer, but still ForEach is slower. 我测试了您的代码并增加了每个任务的睡眠时间,然后结果更接近,但ForEach仍然较慢。

[Edit - more research] [编辑-更多研究]

I added the following to the execution loops: 我在执行循环中添加了以下内容:

if (Thread.CurrentThread.ManagedThreadId > maxThreadId)
   maxThreadId = Thread.CurrentThread.ManagedThreadId;

What this shows on my machine is that it uses 10 threads less with ForEach, compared to the other one with the current settings. 这在我的机器上显示的是,与使用当前设置的另一个线程相比,使用ForEach减少了10个线程。 If you want more threads out of ForEach, you would have to fiddle around with ParallelOptions and the Scheduler. 如果您想从ForEach中获得更多线程,则必须在ParallelOptions和Scheduler上做些麻烦。

See Does Parallel.ForEach limits the number of active threads? 请参见Parallel.ForEach是否限制活动线程数?

I think I can answer your question. 我想我可以回答你的问题。 First of all, you didn't write how many cores your system has. 首先,您没有写系统拥有多少个内核。 if you are running a dual-core, only 4 thread will work using the Parallel.For while you are working with 10 threads in your Thread example. 如果您正在运行双核,则使用Parallel.For只能使用4个线程。在Thread示例中使用10个线程时。 More threads will work better as the task you are running (Printing + Short sleep) is a very short task for threading and the thread overhead is very large compared to the task, I'm almost sure that if you write the same code without threads it will work faster. 更多线程将更好地工作,因为您正在运行的任务(打印+短睡眠)是非常短的线程任务,与该任务相比,线程开销非常大,我几乎可以确定,如果您编写的相同代码没有线程它会更快地工作。

Both your methods works pretty much the same but if you create all the threads in advance you save a lot as the Parallel.For uses the Task pool which adds some move overhead. 两种方法的工作原理几乎相同,但是如果提前创建所有线程,则可以节省大量的Parallel.For使用Task池,这增加了一些移动开销。

The comparison is not very fair in regard to Threading.Parallel. 就Threading.Parallel而言,这种比较不是很公平。 You tell your custom thread pool that it'll need 10 threads. 您告诉自定义线程池,它将需要10个线程。 Threading.Parallel does not know how much threads it will need so it tries to adapt at run-time taking into account such things as current CPU load and other things. Threading.Parallel不知道它将需要多少线程,因此它将在运行时尝试适应当前CPU负载等因素。 Since the number of iterations in the test is small enough you can this number of threads adaption penalty. 由于测试中的迭代次数足够小,因此您可以使用此数量的线程适应惩罚。 Providing the same hint for Threading.Parallel will make it run much faster: 为Threading.Parallel提供相同的提示将使其运行更快:


int workerThreads;
int completionPortThreads;
ThreadPool.GetMinThreads(out workerThreads, out completionPortThreads);
ThreadPool.SetMinThreads(10, completionPortThreads);

It's logical :-) 这是合乎逻辑的:-)

That would be the first time in history that addition of one (or two) layers of code improved performance. 这将是历史上第一次添加一层(或两层)代码提高性能。 When you use convenience libraries you should expect to pay the price. 当您使用便利库时,您应该期望付出代价。 BTW you haven't posted the numbers. 顺便说一句,您还没有发布这些数字。 Got to publish results :-) 要发布结果:-)

To make things a bit more failr (or biased :-) for the Parallel-s, convert the list into array. 为了使Parallel-s更加失败(或偏向:-),请将列表转换为数组。

Then to make them totally unfair, split the work on your own, make an array of just 10 items and totally spoon feed actions to Parallel. 然后,要使它们完全不公平,请自行拆分工作,仅排列10个项目,并完全向Parallel喂料。 You are of course doing the job that Parallel-s promised to do for you at this point but it's bound to be an interesting number :-) 当然,您现在已经完成了Parallel-s承诺为您完成的工作,但这肯定是一个有趣的数字:-)

BTW I just read that Reed's blog. 顺便说一句,我刚刚读了里德的博客。 The partitioning used in this question is what he calls the most simple and naive partitioning. 这个问题中使用的分区是他所谓的最简单,最幼稚的分区。 Which makes it a very good elimination test indeed. 这确实使它成为一个非常好的消除测试。 You still need to check the zero work case just to know if it's totally hosed. 您仍然需要检查零工作情况,以了解其是否完全胶合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM