简体   繁体   English

在 Parallel.ForEach 中嵌套 await

[英]Nesting await in Parallel.ForEach

In a metro app, I need to execute a number of WCF calls.在 Metro 应用程序中,我需要执行多个 WCF 调用。 There are a significant number of calls to be made, so I need to do them in a parallel loop.需要进行大量调用,因此我需要在并行循环中执行它们。 The problem is that the parallel loop exits before the WCF calls are all complete.问题是并行循环在 WCF 调用全部完成之前退出。

How would you refactor this to work as expected?您将如何重构它以按预期工作?

var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };
var customers = new  System.Collections.Concurrent.BlockingCollection<Customer>();

Parallel.ForEach(ids, async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var cust = await repo.GetCustomer(i);
    customers.Add(cust);
});

foreach ( var customer in customers )
{
    Console.WriteLine(customer.ID);
}

Console.ReadKey();

The whole idea behind Parallel.ForEach() is that you have a set of threads and each thread processes part of the collection. Parallel.ForEach()背后的整个想法是你有一组线程,每个线程处理集合的一部分。 As you noticed, this doesn't work with async - await , where you want to release the thread for the duration of the async call.正如您所注意到的,这不适用于async - await ,您希望在异步调用期间释放线程。

You could “fix” that by blocking the ForEach() threads, but that defeats the whole point of async - await .你可以通过阻塞ForEach()线程来“修复”这个问题,但这会破坏async - await的全部意义。

What you could do is to use TPL Dataflow instead of Parallel.ForEach() , which supports asynchronous Task s well.您可以做的是使用TPL Dataflow而不是Parallel.ForEach() ,后者很好地支持异步Task

Specifically, your code could be written using a TransformBlock that transforms each id into a Customer using the async lambda.具体来说,您的代码可以使用TransformBlock编写,该TransformBlock使用async lambda 将每个 id 转换为Customer This block can be configured to execute in parallel.该块可以配置为并行执行。 You would link that block to an ActionBlock that writes each Customer to the console.您可以将该块链接到将每个Customer写入控制台的ActionBlock After you set up the block network, you can Post() each id to the TransformBlock .设置块网络后,您可以Post()每个 id 到TransformBlock

In code:在代码中:

var ids = new List<string> { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };

var getCustomerBlock = new TransformBlock<string, Customer>(
    async i =>
    {
        ICustomerRepo repo = new CustomerRepo();
        return await repo.GetCustomer(i);
    }, new ExecutionDataflowBlockOptions
    {
        MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
    });
var writeCustomerBlock = new ActionBlock<Customer>(c => Console.WriteLine(c.ID));
getCustomerBlock.LinkTo(
    writeCustomerBlock, new DataflowLinkOptions
    {
        PropagateCompletion = true
    });

foreach (var id in ids)
    getCustomerBlock.Post(id);

getCustomerBlock.Complete();
writeCustomerBlock.Completion.Wait();

Although you probably want to limit the parallelism of the TransformBlock to some small constant.尽管您可能希望将TransformBlock的并行性限制为某个小常量。 Also, you could limit the capacity of the TransformBlock and add the items to it asynchronously using SendAsync() , for example if the collection is too big.此外,您可以限制TransformBlock的容量并使用SendAsync()向其异步添加项目,例如,如果集合太大。

As an added benefit when compared to your code (if it worked) is that the writing will start as soon as a single item is finished, and not wait until all of the processing is finished.与您的代码(如果它有效)相比,一个额外的好处是写入将在单个项目完成后立即开始,而不是等到所有处理完成。

svick's answer is (as usual) excellent. svick 的回答(像往常一样)非常好。

However, I find Dataflow to be more useful when you actually have large amounts of data to transfer.但是,我发现当您实际上有大量数据要传输时,Dataflow 会更有用。 Or when you need an async -compatible queue.或者当你需要一个async兼容的队列时。

In your case, a simpler solution is to just use the async -style parallelism:在您的情况下,一个更简单的解决方案是只使用async样式的并行性:

var ids = new List<string>() { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10" };

var customerTasks = ids.Select(i =>
  {
    ICustomerRepo repo = new CustomerRepo();
    return repo.GetCustomer(i);
  });
var customers = await Task.WhenAll(customerTasks);

foreach (var customer in customers)
{
  Console.WriteLine(customer.ID);
}

Console.ReadKey();

Using DataFlow as svick suggested may be overkill, and Stephen's answer does not provide the means to control the concurrency of the operation.按照 svick 的建议使用 DataFlow 可能有点矫枉过正,Stephen 的回答没有提供控制操作并发性的方法。 However, that can be achieved rather simply:但是,这可以很简单地实现:

public static async Task RunWithMaxDegreeOfConcurrency<T>(
     int maxDegreeOfConcurrency, IEnumerable<T> collection, Func<T, Task> taskFactory)
{
    var activeTasks = new List<Task>(maxDegreeOfConcurrency);
    foreach (var task in collection.Select(taskFactory))
    {
        activeTasks.Add(task);
        if (activeTasks.Count == maxDegreeOfConcurrency)
        {
            await Task.WhenAny(activeTasks.ToArray());
            //observe exceptions here
            activeTasks.RemoveAll(t => t.IsCompleted); 
        }
    }
    await Task.WhenAll(activeTasks.ToArray()).ContinueWith(t => 
    {
        //observe exceptions in a manner consistent with the above   
    });
}

The ToArray() calls can be optimized by using an array instead of a list and replacing completed tasks, but I doubt it would make much of a difference in most scenarios. ToArray()调用可以通过使用数组而不是列表并替换已完成的任务来优化,但我怀疑它在大多数情况下会产生很大的不同。 Sample usage per the OP's question:每个 OP 问题的示例用法:

RunWithMaxDegreeOfConcurrency(10, ids, async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var cust = await repo.GetCustomer(i);
    customers.Add(cust);
});

EDIT Fellow SO user and TPL wiz Eli Arbel pointed me to a related article from Stephen Toub . EDIT Fellow SO 用户和 TPL 奇才Eli Arbel向我指出了Stephen Toub 的一篇相关文章 As usual, his implementation is both elegant and efficient:像往常一样,他的实现既优雅又高效:

public static Task ForEachAsync<T>(
      this IEnumerable<T> source, int dop, Func<T, Task> body) 
{ 
    return Task.WhenAll( 
        from partition in Partitioner.Create(source).GetPartitions(dop) 
        select Task.Run(async delegate { 
            using (partition) 
                while (partition.MoveNext()) 
                    await body(partition.Current).ContinueWith(t => 
                          {
                              //observe exceptions
                          });
                      
        })); 
}

You can save effort with the new AsyncEnumerator NuGet Package , which didn't exist 4 years ago when the question was originally posted.您可以使用新的AsyncEnumerator NuGet Package节省工作量,该在 4 年前最初发布问题时还不存在。 It allows you to control the degree of parallelism:它允许您控制并行度:

using System.Collections.Async;
...

await ids.ParallelForEachAsync(async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    var cust = await repo.GetCustomer(i);
    customers.Add(cust);
},
maxDegreeOfParallelism: 10);

Disclaimer: I'm the author of the AsyncEnumerator library, which is open source and licensed under MIT, and I'm posting this message just to help the community.免责声明:我是 AsyncEnumerator 库的作者,该库是开源的并在 MIT 许可下发布,我发布此消息只是为了帮助社区。

Wrap the Parallel.Foreach into a Task.Run() and instead of the await keyword use [yourasyncmethod].ResultParallel.Foreach包装到Task.Run() ,而不是使用await关键字使用[yourasyncmethod].Result

(you need to do the Task.Run thing to not block the UI thread) (您需要执行 Task.Run 以不阻塞 UI 线程)

Something like this:像这样的东西:

var yourForeachTask = Task.Run(() =>
        {
            Parallel.ForEach(ids, i =>
            {
                ICustomerRepo repo = new CustomerRepo();
                var cust = repo.GetCustomer(i).Result;
                customers.Add(cust);
            });
        });
await yourForeachTask;

This should be pretty efficient, and easier than getting the whole TPL Dataflow working:这应该非常有效,而且比让整个 TPL 数据流工作更容易:

var customers = await ids.SelectAsync(async i =>
{
    ICustomerRepo repo = new CustomerRepo();
    return await repo.GetCustomer(i);
});

...

public static async Task<IList<TResult>> SelectAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, Task<TResult>> selector, int maxDegreesOfParallelism = 4)
{
    var results = new List<TResult>();

    var activeTasks = new HashSet<Task<TResult>>();
    foreach (var item in source)
    {
        activeTasks.Add(selector(item));
        if (activeTasks.Count >= maxDegreesOfParallelism)
        {
            var completed = await Task.WhenAny(activeTasks);
            activeTasks.Remove(completed);
            results.Add(completed.Result);
        }
    }

    results.AddRange(await Task.WhenAll(activeTasks));
    return results;
}

An extension method for this which makes use of SemaphoreSlim and also allows to set maximum degree of parallelism一种使用 SemaphoreSlim 并允许设置最大并行度的扩展方法

    /// <summary>
    /// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
    /// </summary>
    /// <typeparam name="T">Type of IEnumerable</typeparam>
    /// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
    /// <param name="action">an async <see cref="Action" /> to execute</param>
    /// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
    /// Must be grater than 0</param>
    /// <returns>A Task representing an async operation</returns>
    /// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
    public static async Task ForEachAsyncConcurrent<T>(
        this IEnumerable<T> enumerable,
        Func<T, Task> action,
        int? maxDegreeOfParallelism = null)
    {
        if (maxDegreeOfParallelism.HasValue)
        {
            using (var semaphoreSlim = new SemaphoreSlim(
                maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
            {
                var tasksWithThrottler = new List<Task>();

                foreach (var item in enumerable)
                {
                    // Increment the number of currently running tasks and wait if they are more than limit.
                    await semaphoreSlim.WaitAsync();

                    tasksWithThrottler.Add(Task.Run(async () =>
                    {
                        await action(item).ContinueWith(res =>
                        {
                            // action is completed, so decrement the number of currently running tasks
                            semaphoreSlim.Release();
                        });
                    }));
                }

                // Wait for all tasks to complete.
                await Task.WhenAll(tasksWithThrottler.ToArray());
            }
        }
        else
        {
            await Task.WhenAll(enumerable.Select(item => action(item)));
        }
    }

Sample Usage:示例用法:

await enumerable.ForEachAsyncConcurrent(
    async item =>
    {
        await SomeAsyncMethod(item);
    },
    5);

I am a little late to party but you may want to consider using GetAwaiter.GetResult() to run your async code in sync context but as paralled as below;我参加聚会有点晚了,但您可能想考虑使用 GetAwaiter.GetResult() 在同步上下文中运行您的异步代码,但如下所示;

 Parallel.ForEach(ids, i =>
{
    ICustomerRepo repo = new CustomerRepo();
    // Run this in thread which Parallel library occupied.
    var cust = repo.GetCustomer(i).GetAwaiter().GetResult();
    customers.Add(cust);
});

After introducing a bunch of helper methods, you will be able run parallel queries with this simple syntax:在介绍了一堆辅助方法之后,您将能够使用以下简单的语法运行并行查询:

const int DegreeOfParallelism = 10;
IEnumerable<double> result = await Enumerable.Range(0, 1000000)
    .Split(DegreeOfParallelism)
    .SelectManyAsync(async i => await CalculateAsync(i).ConfigureAwait(false))
    .ConfigureAwait(false);

What happens here is: we split source collection into 10 chunks ( .Split(DegreeOfParallelism) ), then run 10 tasks each processing its items one by one ( .SelectManyAsync(...) ) and merge those back into a single list.这里发生的事情是:我们将源集合分成 10 个块( .Split(DegreeOfParallelism) ),然后运行 ​​10 个任务,每个任务一个一个地处理其项目( .SelectManyAsync(...) )并将它们合并回一个列表。

Worth mentioning there is a simpler approach:值得一提的是,有一种更简单的方法:

double[] result2 = await Enumerable.Range(0, 1000000)
    .Select(async i => await CalculateAsync(i).ConfigureAwait(false))
    .WhenAll()
    .ConfigureAwait(false);

But it needs a precaution : if you have a source collection that is too big, it will schedule a Task for every item right away, which may cause significant performance hits.但它需要一个预防措施:如果你有一个太大的源集合,它会立即为每个项目安排一个Task ,这可能会导致显着的性能下降。

Extension methods used in examples above look as follows:上述示例中使用的扩展方法如下所示:

public static class CollectionExtensions
{
    /// <summary>
    /// Splits collection into number of collections of nearly equal size.
    /// </summary>
    public static IEnumerable<List<T>> Split<T>(this IEnumerable<T> src, int slicesCount)
    {
        if (slicesCount <= 0) throw new ArgumentOutOfRangeException(nameof(slicesCount));

        List<T> source = src.ToList();
        var sourceIndex = 0;
        for (var targetIndex = 0; targetIndex < slicesCount; targetIndex++)
        {
            var list = new List<T>();
            int itemsLeft = source.Count - targetIndex;
            while (slicesCount * list.Count < itemsLeft)
            {
                list.Add(source[sourceIndex++]);
            }

            yield return list;
        }
    }

    /// <summary>
    /// Takes collection of collections, projects those in parallel and merges results.
    /// </summary>
    public static async Task<IEnumerable<TResult>> SelectManyAsync<T, TResult>(
        this IEnumerable<IEnumerable<T>> source,
        Func<T, Task<TResult>> func)
    {
        List<TResult>[] slices = await source
            .Select(async slice => await slice.SelectListAsync(func).ConfigureAwait(false))
            .WhenAll()
            .ConfigureAwait(false);
        return slices.SelectMany(s => s);
    }

    /// <summary>Runs selector and awaits results.</summary>
    public static async Task<List<TResult>> SelectListAsync<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, Task<TResult>> selector)
    {
        List<TResult> result = new List<TResult>();
        foreach (TSource source1 in source)
        {
            TResult result1 = await selector(source1).ConfigureAwait(false);
            result.Add(result1);
        }
        return result;
    }

    /// <summary>Wraps tasks with Task.WhenAll.</summary>
    public static Task<TResult[]> WhenAll<TResult>(this IEnumerable<Task<TResult>> source)
    {
        return Task.WhenAll<TResult>(source);
    }
}

Here is a simple generic implementation of a ForEachAsync method, based on an ActionBlock from the TPL Dataflow library, now embedded in the .NET 5 platform:下面是ForEachAsync方法的简单通用实现,它基于来自TPL 数据流库的ActionBlock ,现在嵌入到 .NET 5 平台中:

public static Task ForEachAsync<T>(this IEnumerable<T> source,
    Func<T, Task> action, int dop)
{
    // Arguments validation omitted
    var block = new ActionBlock<T>(action,
        new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = dop });
    try
    {
        foreach (var item in source) block.Post(item);
        block.Complete();
    }
    catch (Exception ex) { ((IDataflowBlock)block).Fault(ex); }
    return block.Completion;
}

This solution enumerates eagerly the supplied IEnumerable , and sends immediately all its elements to the ActionBlock .此解决方案急切地枚举提供的IEnumerable ,并立即将其所有元素发送到ActionBlock So it is not very suitable for enumerables with huge number of elements.所以它不太适合具有大量元素的可枚举。 Below is a more sophisticated approach, that enumerates the source lazily, and sends its elements to the ActionBlock one by one:下面是一种更复杂的方法,它懒惰地枚举源,并将其元素一个一个地发送到ActionBlock

public static async Task ForEachAsync<T>(this IEnumerable<T> source,
    Func<T, Task> action, int dop)
{
    // Arguments validation omitted
    var block = new ActionBlock<T>(action, new ExecutionDataflowBlockOptions()
    { MaxDegreeOfParallelism = dop, BoundedCapacity = dop });
    try
    {
        foreach (var item in source)
            if (!await block.SendAsync(item).ConfigureAwait(false)) break;
        block.Complete();
    }
    catch (Exception ex) { ((IDataflowBlock)block).Fault(ex); }
    try { await block.Completion.ConfigureAwait(false); }
    catch { block.Completion.Wait(); } // Propagate AggregateException
}

These two methods have different behavior in case of exceptions.这两种方法在异常情况下具有不同的行为。 The first¹ propagates an AggregateException containing the exceptions directly in its InnerExceptions property.第一个¹传播一个AggregateException直接在其InnerExceptions属性中包含InnerExceptions The second propagates an AggregateException that contains another AggregateException with the exceptions.第二个传播一个AggregateException ,其中包含另一个带有异常的AggregateException Personally I find the behavior of the second method more convenient in practice, because awaiting it eliminates automatically a level of nesting, and so I can simply catch (AggregateException aex) and handle the aex.InnerExceptions inside the catch block.我个人发现第二种方法的行为在实践中更方便,因为等待它会自动消除一层嵌套,所以我可以简单地catch (AggregateException aex)并处理catch块内的aex.InnerExceptions The first method requires to store the Task before awaiting it, so that I can gain access the task.Exception.InnerExceptions inside the catch block.第一种方法需要在等待Task之前存储它,以便我可以访问catch块内的task.Exception.InnerExceptions For more info about propagating exceptions from async methods, look here or here .有关从异步方法传播异常的更多信息,请查看此处此处

Both implementations handle gracefully any errors that may occur during the enumeration of the source .两种实现都可以优雅地处理在source枚举期间可能发生的任何错误。 The ForEachAsync method does not complete before all pending operations are completed. ForEachAsync方法在所有挂起操作完成之前不会完成。 No tasks are left behind unobserved (in fire-and-forget fashion).没有任何任务被遗忘(以即发即忘的方式)。

¹ The first implementationelides async and await . ¹第一个实现省略了 async 和 await

Easy native way without TPL:没有 TPL 的简单原生方式:

int totalThreads = 0; int maxThreads = 3;

foreach (var item in YouList)
{
    while (totalThreads >= maxThreads) await Task.Delay(500);
    Interlocked.Increment(ref totalThreads);

    MyAsyncTask(item).ContinueWith((res) => Interlocked.Decrement(ref totalThreads));
}

you can check this solution with next task:您可以通过下一个任务检查此解决方案:

async static Task MyAsyncTask(string item)
{
    await Task.Delay(2500);
    Console.WriteLine(item);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM