简体   繁体   English

在Parallel.ForEach()循环中等待的行为是什么?

[英]What is the behaviour of await inside of a Parallel.ForEach() loop?

I have a computationally intensive program which I am attempting to parallelize, however one of the limiting steps is an I/O operation which is controlled by a phenomenally inefficient API I have no control over but have no choice but to use. 我有一个计算密集型程序,我试图并行化,但其中一个限制步骤是I / O操作,它由一个非常低效的API控制,我无法控制,但别无选择,只能使用。 It is imperative that my parallelization does not increase the number of I/O operations, or any benefit will likely very quickly disappear. 我的并行化必须不会增加I / O操作的数量,否则任何好处都可能很快消失。

The layout is something like this: I have two classes, Foo and Bar , and in order to calculate Foo , which involves no small quantity of calculations, I must pass it an instance, or a few instances, of Bar which I import from some other file in an extremely expensive I/O operation. 布局是这样的:我有两个类, FooBar ,并且为了计算Foo ,它涉及不小的计算量,我必须传递一个实例,或者我从一些导入的Bar的一些实例非常昂贵的I / O操作中的其他文件。 I require a large number of both Foo and Bar instances and many of these Bar instances will be used to calculate more than one Foo instance. 我需要大量的FooBar实例,并且许多这些Bar实例将用于计算多个Foo实例。 As a result, I do not want to discard my Bar instances after I calculate each Foo and I do not want to import them more than once each. 因此,在计算每个Foo之后我不想丢弃我的Bar实例,并且我不想每次导入它们多次。 Potentially of note, to make matters more complicated the API is 32-bit, whereas my program must be 64-bit to avoid MemoryException , so that is handled by a locally hosted server which I communicate with using WCF. 可能值得注意的是,为了使问题更复杂,API是32位,而我的程序必须是64位以避免MemoryException ,因此由本地托管的服务器处理,我使用WCF进行通信。

Here is my proposed solution, but I am extremely new to parallelization and in particular I am unsure of how the await will be handled inside of the ForEach loop wrt freeing up processors: 这是我提出的解决方案,但我对并行化非常陌生,特别是我不确定如何在ForEach循环中处理await来释放处理器:

ConcurrentDictionary<string, Task<Bar>> barList = new ConcurrentDictionary<string, Task<Bar>>();

Parallel.ForEach(fooList, foo =>
{
    if (!barList.ContainsKey(this.RequiredBarName))
    {
        Task<Bar> importBar = Task.Run(() => Import.BarByName(this.RequiredBarName));
        barList.Add(this.RequiredBarName,importBar);
    }
    this.RequiredBarTask = barList.TryGetValue(this.RequiredBarName);
    foo.CalculateStuff();
}

// where foo.CalculateStuff() looks something like this
async public void CalculateStuff()
{
    // do some stuff...
    Bar requiredBar = await this.RequiredBarTask;
    // do some more stuff with requiredBar
}

What will happen when the code runs into that await ? 当代码遇到await时会发生什么? Will the ThreadPool pick up a different Task , or will the processor just idle? ThreadPool会选择一个不同的Task ,还是处理器会闲置? If I then arrange some sort of WaitAll() outside of the Parallel.ForEach() will I be able to parallelize through all of this efficiently? 如果我然后在Parallel.ForEach() WaitAll()之外安排某种WaitAll() ,我能够有效地并行化所有这些吗? Does anyone have any better ideas of how I might implement this? 有没有人对我如何实现这个有更好的想法?

Edit to provide MCVE: 编辑以提供MCVE:

I cannot satisfy the Verifiable component of this as I cannot give you the API and I certainly can't give you any of the data that the API might access, however I will attempt to provide you with something up to the call out to the server. 我无法满足此版本的可验证组件,因为我无法向您提供API,我当然无法为您提供API可能访问的任何数据,但是我会尝试为您提供一些直到调用服务器的内容。 。

The program can effectively go infinitely deep in the way it processes things, it is much easier to think of as a parser of specific instructions which the client is allowed to build using the GUI an a set of "bricks". 程序可以有效地在处理事物的方式上进行无限深入,更容易将其视为特定指令的解析器,允许客户端使用GUI构建一组“砖块”。 In this way Dataflow looks like it could offer a decent solution. 通过这种方式,Dataflow看起来可以提供一个像样的解决方案。

In this example I don't take care of circular references or one Channel calculating another Channel which has already been called for by the Parallel.ForEach() method; 在这个例子中,我没有处理循环引用或一个Channel计算已经由Parallel.ForEach()方法调用的另一个Channel ; in my code this is handled by some logic and Concurrent lists to check when various things have been called. 在我的代码中,这由一些逻辑和并发列表处理,以检查何时调用各种事物。

public abstract class Class
{
    public string Name {get;set;}
    public float[] Data {get;set;}

    async public Task CalculateData(IsampleService proxy){}
}

public class Channel : Class
{
    public Class[] ChildClasses {get;set;}

    async public override Task CalculateData(IsampleService proxy)
    {
        foreach(Class childClass in ChildClasses)
        {
            // not the real processing but this step could be anything. There is a class to handle what happens here, but it is unnecessary for this post.
            if(childClass.Data==null) await childClass.CalculateData(proxy);
            this.Data = childClass.Data;
        }
    }
}

public class Input : Class
{
    async public override Task CalculateData(IsampleService proxy)
    {
            this.Data = await proxy.ReturnData(this.Name);
    }
}

async public static Task ProcessDataForExport(Channel[] channelArray)
{
ChannelFactory<IsampleService> factory = new ChannelFactory<IsampleService>(new NetNamedPipeBinding(), new EndpointAddress(baseAddress));

IsampleService proxy = factory.CreateChannel();

Parallel.ForEach(channelArray, channel =>
    {
        channel.CalculateData();
    });
// Task.WhenAll() might be a better alternative to the Parallel.ForEach() here.
}

What will happen when the code runs into that await? 当代码遇到等待时会发生什么?

The same thing that happens for any await statement: after having evaluated whatever expression or statement retrieves the Task to be awaited, the method will return. 对于任何await语句都会发生同样的事情:在评估了任何表达式或语句检索要等待的Task ,该方法将返回。 For all intents and purposes, that is the end of the method. 对于所有意图和目的, 是方法的结束。

Will the ThreadPool pick up a different Task, or will the processor just idle? ThreadPool会选择一个不同的Task,还是处理器会闲置?

That depends on what else is going on. 这取决于还有什么。 For example, what are you awaiting on? 例如,你在等什么? If it's a computational task queued to the thread pool, and it wasn't already assigned a thread pool thread, then sure…the thread pool might pick that up and start working on it. 如果它是一个排队到线程池的计算任务,并且它还没有被分配一个线程池线程,那么确定......线程池可能会选择它并开始处理它。

If you're waiting on an I/O operation, then that won't necessarily keep the processor busy, but there may still be other tasks in the thread pool queue (such as other ones from the Parallel.ForEach() call). 如果您正在等待I / O操作,那么这不一定会使处理器忙,但线程池队列中可能还有其他任务(例如来自Parallel.ForEach()调用的其他任务)。 So that would give the processor something to work on. 这样就可以让处理器有所作为。

Certainly, using await doesn't generally result in the processer being idle. 当然,使用await通常不会导致处理器空闲。 In fact, the main reason for using it is to avoid just that (*). 事实上,使用它的主要原因是避免(*)。 As the await statement causes the current method to return, you let the current thread proceed, which means that if otherwise there weren't enough threads to keep the processor busy, now it has something to do. 由于await语句导致当前方法返回,所以让当前线程继续运行,这意味着如果没有足够的线程来保持处理器忙,那么现在它有事可做。 :) :)

(*) (well, sort of…really, the main reason is to avoid blocking the current thread, but that has the side-effect of there being more work available for the processer to handle :) ) (*)(好吧,有点......真的,主要的原因是避免阻塞当前线程,但这有副作用,有更多的工作可供处理器处理:))

If I then arrange some sort of WaitAll() outside of the Parallel.ForEach() will I be able to parallelize through all of this efficiently? 如果我然后在Parallel.ForEach()之外安排某种WaitAll(),我能够有效地并行化所有这些吗? Does anyone have any better ideas of how I might implement this? 有没有人对我如何实现这个有更好的想法?

I don't see enough useful detail in your question to answer that. 我没有在你的问题中看到足够有用的细节来回答这个问题。 Frankly, while I can't put my finger on it, the use of await from a Parallel.ForEach() delegate seems fishy to me somehow. 坦率地说,虽然我不能把手指放在它上面,但是从一个Parallel.ForEach()委托中使用await对我来说似乎有点可疑。 As soon as you call await , the delegate's method will return. 一旦调用await ,代理的方法就会返回。

Hence, as far as Parallel.ForEach() knows, you're done with that item in the enumeration, but of course you're not. 因此,就Parallel.ForEach() ,您已完成枚举中的该项,但当然您不是。 It will have to be finished elsewhere. 它必须在其他地方完成。 At the very least, that seems like it would hinder the Parallel class's ability to know enough about the work it's doing to schedule it most effectively. 至少,这似乎会妨碍Parallel类能够充分了解它正在做的工作,以便最有效地安排它。

But maybe that's okay. 但也许没关系。 Or maybe it's not great, but is the best you're going to achieve given the framework you're tied to. 或者它可能不是很好,但鉴于你所依赖的框架,它是你将要实现的最佳目标。 Hard to say. 很难说。


I do encourage you to provide the MCVE that commenter Scott Chamberlain's asked for. 我鼓励您提供评论者Scott Chamberlain要求的MCVE。 If he's right and your problem is addressable through the dataflow API, you would do well to give him the chance to provide you an answer that shows that. 如果他是对的并且您的问题可通过数据流API解决,那么您最好给他一个机会来为您提供显示该问题的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM