[英]Parallel foreach with asynchronous lambda
我想並行處理一個集合,但我在實現它時遇到了麻煩,因此我希望能得到一些幫助。
如果我想在並行循環的 lambda 中調用 C# 中標記為 async 的方法,就會出現問題。 例如:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}
var count = bag.Count;
計數為 0 時會出現問題,因為創建的所有線程實際上只是后台線程,並且Parallel.ForEach
調用不會等待完成。 如果我刪除 async 關鍵字,該方法如下所示:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, item =>
{
// some pre stuff
var responseTask = await GetData(item);
responseTask.Wait();
var response = responseTask.Result;
bag.Add(response);
// some post stuff
}
var count = bag.Count;
它可以工作,但它完全禁用了等待的聰明,我必須做一些手動異常處理..(為簡潔起見已刪除)。
如何實現Parallel.ForEach
循環,在 lambda 中使用 await 關鍵字? 可能嗎?
Parallel.ForEach 方法的原型采用Action<T>
作為參數,但我希望它等待我的異步 lambda。
如果你只想要簡單的並行性,你可以這樣做:
var bag = new ConcurrentBag<object>();
var tasks = myCollection.Select(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
});
await Task.WhenAll(tasks);
var count = bag.Count;
如果您需要更復雜的東西,請查看Stephen Toub 的ForEachAsync
帖子。
您可以使用AsyncEnumerator NuGet Package中的ParallelForEachAsync
擴展方法:
using Dasync.Collections;
var bag = new ConcurrentBag<object>();
await myCollection.ParallelForEachAsync(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}, maxDegreeOfParallelism: 10);
var count = bag.Count;
免責聲明:我是 AsyncEnumerator 庫的作者,該庫是開源的並在 MIT 下獲得許可,我發布此消息只是為了幫助社區。
新的 .NET 6 API 之一是Parallel.ForEachAsync ,這是一種安排異步工作的方法,允許您控制並行度:
var urls = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://stackoverflow.com"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urls, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});
Scott Hanselman 博客中的另一個例子。
出處,供參考。
使用SemaphoreSlim
,您可以實現並行控制。
var bag = new ConcurrentBag<object>();
var maxParallel = 20;
var throttler = new SemaphoreSlim(initialCount: maxParallel);
var tasks = myCollection.Select(async item =>
{
await throttler.WaitAsync();
try
{
var response = await GetData(item);
bag.Add(response);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
var count = bag.Count;
從其他答案和公認的 asnwer 引用的文章中編譯的最簡單的可能擴展方法:
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync();
try
{
await asyncAction(item).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
更新:這是一個簡單的修改,它還支持評論中請求的取消令牌(未經測試)
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, CancellationToken, Task> asyncAction, int maxDegreeOfParallelism, CancellationToken cancellationToken)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync(cancellationToken);
if (cancellationToken.IsCancellationRequested) return;
try
{
await asyncAction(item, cancellationToken).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
我的 ParallelForEach 異步的輕量級實現。
特征:
public static class AsyncEx
{
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism = 10)
{
var semaphoreSlim = new SemaphoreSlim(maxDegreeOfParallelism);
var tcs = new TaskCompletionSource<object>();
var exceptions = new ConcurrentBag<Exception>();
bool addingCompleted = false;
foreach (T item in source)
{
await semaphoreSlim.WaitAsync();
asyncAction(item).ContinueWith(t =>
{
semaphoreSlim.Release();
if (t.Exception != null)
{
exceptions.Add(t.Exception);
}
if (Volatile.Read(ref addingCompleted) && semaphoreSlim.CurrentCount == maxDegreeOfParallelism)
{
tcs.TrySetResult(null);
}
});
}
Volatile.Write(ref addingCompleted, true);
await tcs.Task;
if (exceptions.Count > 0)
{
throw new AggregateException(exceptions);
}
}
}
使用示例:
await Enumerable.Range(1, 10000).ParallelForEachAsync(async (i) =>
{
var data = await GetData(i);
}, maxDegreeOfParallelism: 100);
我為此創建了一個擴展方法,它利用 SemaphoreSlim 並且還允許設置最大並行度
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
樣品用法:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
在接受的答案中,不需要 ConcurrentBag。 這是一個沒有它的實現:
var tasks = myCollection.Select(GetData).ToList();
await Task.WhenAll(tasks);
var results = tasks.Select(t => t.Result);
任何“// some pre stuff”和“// some post stuff”都可以進入GetData實現(或另一個調用GetData的方法)
除了更短之外,沒有使用“async void”lambda,這是一種反模式。
以下設置為與IAsyncEnumerable
一起使用,但可以通過更改類型並刪除foreach
上的“await”來修改為使用IEnumerable
。 它比創建無數並行任務然后等待它們全部更適合大量數據。
public static async Task ForEachAsyncConcurrent<T>(this IAsyncEnumerable<T> enumerable, Func<T, Task> action, int maxDegreeOfParallelism, int? boundedCapacity = null)
{
ActionBlock<T> block = new ActionBlock<T>(
action,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism,
BoundedCapacity = boundedCapacity ?? maxDegreeOfParallelism * 3
});
await foreach (T item in enumerable)
{
await block.SendAsync(item).ConfigureAwait(false);
}
block.Complete();
await block.Completion;
}
隨着.Net 6的推出, Parallel.ForEachAsync現已可用。
using System.Net.Http.Headers;
using System.Net.Http.Json;
var userHandlers = new []
{
"users/okyrylchuk",
"users/shanselman",
"users/jaredpar",
"users/davidfowl"
};
using HttpClient client = new()
{
BaseAddress = new Uri("https://api.github.com"),
};
client.DefaultRequestHeaders.UserAgent.Add(new ProductInfoHeaderValue("DotNet", "6"));
ParallelOptions parallelOptions = new()
{
MaxDegreeOfParallelism = 3
};
await Parallel.ForEachAsync(userHandlers, parallelOptions, async (uri, token) =>
{
var user = await client.GetFromJsonAsync<GitHubUser>(uri, token);
Console.WriteLine($"Name: {user.Name}\nBio: {user.Bio}\n");
});
public class GitHubUser
{
public string Name { get; set; }
public string Bio { get; set; }
}
github上的完整問題跟蹤以及SCOTT HANSELMAN 在此處提供的一些使用示例
對於更簡單的解決方案(不確定是否是最佳解決方案),您可以簡單地將Parallel.ForEach
嵌套在Task
中 - 因此
var options = new ParallelOptions { MaxDegreeOfParallelism = 5 }
Task.Run(() =>
{
Parallel.ForEach(myCollection, options, item =>
{
DoWork(item);
}
}
ParallelOptions
將為您執行節流,開箱即用。
我在現實世界的場景中使用它在后台運行很長時間的操作。 這些操作是通過 HTTP 調用的,它被設計為在長操作運行時不會阻塞 HTTP 調用。
這樣,CI/CD 調用不會因為長時間的 HTTP 操作而超時,而是每 x 秒循環一次狀態而不阻塞進程
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.