简体   繁体   English

如何从.NET 4.5中的并行任务中获益

[英]How to yield from parallel tasks in .NET 4.5

I would like to use .NET iterator with parallel Tasks/await?. 我想使用.NET迭代器和并行Tasks / await?。 Something like this: 像这样的东西:

IEnumerable<TDst> Foo<TSrc, TDest>(IEnumerable<TSrc> source)
{
    Parallel.ForEach(
        source,
        s=>
        {
            // Ordering is NOT important
            // items can be yielded as soon as they are done                
            yield return ExecuteOrDownloadSomething(s);
        }
}

Unfortunately .NET cannot natively handle this. 不幸的是.NET无法原生地处理这个问题。 Best answer so far by @svick - use AsParallel(). 到目前为止@svick的最佳答案 - 使用AsParallel()。

BONUS: Any simple async/await code that implements multiple publishers and a single subscriber? 奖励:任何实现多个发布者和单个订阅者的简单异步/等待代码? The subscriber would yield, and the pubs would process. 订阅者将屈服,并且pubs将处理。 (core libraries only) (仅核心库)

This seems like a job for PLINQ: 这似乎是PLINQ的工作:

return source.AsParallel().Select(s => ExecuteOrDownloadSomething(s));

This will execute the delegate in parallel using a limited number of threads, returning each result as soon as it completes. 这将使用有限数量的线程并行执行委托,并在完成后立即返回每个结果。

If the ExecuteOrDownloadSomething() method is IO-bound (eg it actually downloads something) and you don't want to waste threads, then using async - await might make sense, but it would be more complicated. 如果ExecuteOrDownloadSomething()方法是IO绑定的(例如它实际下载了一些东西)并且你不想浪费线程,那么使用async - await可能有意义,但它会更复杂。

If you want to fully take advantage of async , you shouldn't return IEnumerable , because it's synchronous (ie it blocks if no items are available). 如果你想充分利用async ,你不应该返回IEnumerable ,因为它是同步的(即如果没有可用的项,它会阻塞)。 What you need is some sort of asynchronous collection, and you can use ISourceBlock (specifically, TransformBlock ) from TPL Dataflow for that: 你需要的是某种异步集合,你可以使用TPL Dataflow的ISourceBlock (特别是TransformBlock ):

ISourceBlock<TDst> Foo<TSrc, TDest>(IEnumerable<TSrc> source)
{
    var block = new TransformBlock<TSrc, TDest>(
        async s => await ExecuteOrDownloadSomethingAsync(s),
        new ExecutionDataflowBlockOptions
        {
            MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
        });

    foreach (var item in source)
        block.Post(item);

    block.Complete();

    return block;
}

If the source is “slow” (ie you want to start processing the results from Foo() before iterating source is completed), you might want to move the foreach and Complete() call to a separate Task . 如果源是“慢”(即您希望在迭代source完成之前开始处理来自Foo()的结果),您可能希望将foreachComplete()调用移动到单独的Task Even better solution would be to make source into a ISourceBlock<TSrc> too. 更好的解决方案是将source转换为ISourceBlock<TSrc>

So it appears what you really want to do is to order a sequence of tasks based on when they complete. 因此,您真正想要做的就是根据完成时间顺序排列一系列任务。 This is not terribly complex: 这不是非常复杂:

public static IEnumerable<Task<T>> Order<T>(this IEnumerable<Task<T>> tasks)
{
    var input = tasks.ToList();

    var output = input.Select(task => new TaskCompletionSource<T>());
    var collection = new BlockingCollection<TaskCompletionSource<T>>();
    foreach (var tcs in output)
        collection.Add(tcs);

    foreach (var task in input)
    {
        task.ContinueWith(t =>
        {
            var tcs = collection.Take();
            switch (task.Status)
            {
                case TaskStatus.Canceled:
                    tcs.TrySetCanceled();
                    break;
                case TaskStatus.Faulted:
                    tcs.TrySetException(task.Exception.InnerExceptions);
                    break;
                case TaskStatus.RanToCompletion:
                    tcs.TrySetResult(task.Result);
                    break;
            }
        }
        , CancellationToken.None
        , TaskContinuationOptions.ExecuteSynchronously
        , TaskScheduler.Default);
    }

    return output.Select(tcs => tcs.Task);
}

So here we create a TaskCompletionSource for each input task, then go through each of the tasks and set a continuation which grabs the next completion source from a BlockingCollection and sets it's result. 所以这里我们为每个输入任务创建一个TaskCompletionSource ,然后遍历每个任务并设置一个继续,它从BlockingCollection获取下一个完成源并设置它的结果。 The first task completed grabs the first tcs that was returned, the second task completed gets the second tcs that was returned, and so on. 完成的第一个任务抓取返回的第一个tcs,第二个任务完成获取返回的第二个tcs,依此类推。

Now your code becomes quite simple: 现在您的代码变得非常简单:

var tasks = collection.Select(item => LongRunningOperationThatReturnsTask(item))
    .Order();
foreach(var task in tasks)
{
    var result = task.Result;//or you could `await` each result
    //....
}

In the asynchronous library made by the MS robotics team, they had concurrency primitives which allowed for using an iterator to yield asynchronous code. 在MS机器人团队制作的异步库中,它们具有并发原语,允许使用迭代器生成异步代码。

The library (CCR) is free (It didn't use to be free). 图书馆(CCR)是免费的(它不是免费的)。 A nice introductory article can be found here: Concurrent affairs 这里有一篇很好的介绍性文章: 并发事务

Perhaps you can use this library alongside .Net task library, or it'll inspire you to 'roll your own' 也许你可以将这个库与.Net任务库一起使用,或者它会激励你“自己动手”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM