简体   繁体   English

Parallel.ForEach问题

[英]Parallel.ForEach questions

I am using a Parallel.ForEach loop in C# / VS2010 to do processing and I have a couple of questions. 我在C#/ VS2010中使用Parallel.ForEach循环来进行处理,我有几个问题。

First of all I have a process that needs to extract information from a remote webservice and then needs to build images (GDI) on the fly. 首先,我有一个流程需要从远程Web服务中提取信息,然后需要动态构建映像(GDI)。

I have a class that encapsulates all of the functionality into a single object with two main methods Load() and CreateImage() with all the GDI management / WebRequests "blackboxed" inside this object. 我有一个类,它将所有功能封装到一个对象中,使用两个主要方法Load()和CreateImage(),并在此对象中包含所有GDI管理/ WebRequests“blackboxed”。

I then create a GenericList that contains all the objects that need to be processed and I iterate through the list using the following code: 然后我创建一个包含所有需要处理的对象的GenericList,并使用以下代码遍历列表:

try
        {
            Parallel.ForEach(MyLGenericList, ParallelOptions, (MyObject, loopState) =>
            {                                       

                    MyObject.DoLoad();
                    MyObject.CreateImage();
                    MyObject.Dispose();

                if (loopState.ShouldExitCurrentIteration || loopState.IsExceptional)
                    loopState.Stop();
            });
        }
        catch (OperationCanceledException ex)
        {
            // Cancel here
        }
        catch (Exception ex)
        {
            throw ex;
        }

Now my questions are: 现在我的问题是:

  1. Given that there could be ten thousand items in the list to parse, is the above code the best way to approach this? 鉴于列表中可能有一万个项要解析,上面的代码是最好的方法吗? Any other ideas more then welcome 任何其他想法更受欢迎
  2. I have an issue whereby when I start the process the objects are created / loaded and images created very fast but after around six hundred objects the process starts to crawl. 我有一个问题,当我开始进程时,对象被创建/加载并且图像创建得非常快,但是在大约六百个对象之后,进程开始爬行。 It doesn eventually finish, is this normal? 它最终没有完成,这是正常的吗?

Thanks in advance :) Adam 在此先感谢:)亚当

I am not sure that downloading data in parallel is a good idea since it will block a lot of threads. 我不确定并行下载数据是个好主意,因为它会阻塞很多线程。 Split your task into a producer and a consumer instead. 将您的任务拆分为生产者和消费者。 Then you can parallelize each of them separately. 然后,您可以分别并行化每个。

Here is an example of a single producer and multiple consumers. 以下是单个生产者和多个消费者的示例。
(If the consumers are faster than the producer you can just use a normal foreach instead of parallel.ForEach) (如果消费者比生产者更快,你可以使用普通的foreach而不是parallel.ForEach)

var sources = BlockingCollection<SourceData>();
var producer = Task.Factory.CreateNew(
    () => {
        foreach (var item in MyGenericList) {
            var data = webservice.FetchData(item);
            sources.Add(data)
        }
        sources.CompleteAdding();
    }
)
Parallel.ForEach(sources.GetConsumingPartitioner(),
                 data => {
                     imageCreator.CreateImage(data);
                 });

(the GetConsumingPartitioner extension is part of the ParallelExtensionsExtras ) (GetConsumingPartitioner扩展是ParallelExtensionsExtras的一部分)

Edit A more complete example 编辑更完整的示例

var sources = BlockingCollection<SourceData>();

var producerOptions = new ParallelOptions { MaxDegreeOfParallelism = 5 };
var consumerOptions = new ParallelOptions { MaxDegreeOfParallelism = -1 };

var producers = Task.Factory.CreateNew(
    () => {
        Parallel.ForEach(MyLGenericList, producerOptions, 
            myObject => {
                myObject.DoLoad()
                sources.Add(myObject)
            });
        sources.CompleteAdding();
    });
Parallel.ForEach(sources.GetConsumingPartitioner(), consumerOptions,
    myObject => {
        myObject.CreateImage();
        myObject.Dispose();
    });

With this code you can optimize the amount of parallel downloads while keeping the cpu busy with the image processing. 使用此代码,您可以优化并行下载量,同时保持CPU忙于图像处理。

The Parallel.ForEach method with the default settings works best when the work that the loop body does is CPU bound. 当循环体所做的工作是CPU绑定时,具有默认设置的Parallel.ForEach方法效果最佳。 If you are blocking or hand off the work to another party synchronously, the scheduler thinks that the CPU still isn't busy and keeps cramming more tasks, trying hard to use all the CPUs in the system. 如果您同步阻止或将工作交给另一方,则调度程序认为CPU仍然不忙并且不断填写更多任务,努力使用系统中的所有CPU。

In your case you need to just pick a reasonable number of overlapping downloads to occur in parallel and set that value in your ForEach options because you aren't going to saturate the CPUs with your loop. 在您的情况下,您需要选择合理数量的重叠下载并行发生并在ForEach选项中设置该值,因为您不会使用您的循环使CPU饱和。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM