如何使用 NoBuffering 運行 Parallel.ForEachAsync 循環？

Question

同步Parallel.ForEach方法有許多重載，其中一些允許使用EnumerablePartitionerOptions.NoBuffering選項配置並行循環：

創建一個分區器，它一次從可枚舉的源中獲取項目，並且不使用可以由多個線程更有效地訪問的中間存儲。 此選項提供對低延遲的支持（項目將在源可用時立即處理）並為項目之間的依賴關系提供部分支持（線程不能死鎖等待線程本身負責處理的項目）。

異步Parallel.ForEachAsync沒有這樣的選項或重載。 這對我來說是個問題，因為我想在作為消費者的生產者-消費者場景中使用這個方法，將Channel<T>作為源。 在我的場景中，重要的是消費者只咬它可以咀嚼的東西，而不是更多。 我不希望消費者積極地拉動Channel<T> ，然后將拉動的元素放入其個人隱藏緩沖區中。 我希望Channel<T>成為系統中唯一的隊列，以便我可以對其進行監控，並獲得有關等待處理/使用的元素的准確統計信息。

直到最近，我的印象是Parallel.ForEachAsync方法沒有設計緩沖。 但為了確定，我在 GitHub 上向微軟詢問了澄清。 我很快就得到了反饋，但不是我所期望的：

這是一個實現細節。 使用Parallel.ForEach ，緩沖是為了處理可能非常快的主體委托，因此它試圖最小化/攤銷獲取鎖以訪問共享枚舉器的成本。 使用ForEachAsync ，預計主體代表至少會更豐富一些，因此它不會嘗試進行這種攤銷。 至少今天不是。

依賴於實現細節是非常不可取的。 所以我必須重新考慮我的方法。

我的問題是：是否可以配置Parallel.ForEachAsync API 以保證NoBuffering行為？ 如果是，如何？

澄清：我不是在問如何從頭開始重新發明Parallel.ForEachAsync 。 我要求在現有的Parallel.ForEachAsync API 周圍使用某種薄包裝器，這將“注入”理想的NoBuffering行為。 像這樣的東西：

public static Task ForEachAsync_NoBuffering<TSource>(
    IAsyncEnumerable<TSource> source,
    ParallelOptions parallelOptions,
    Func<TSource, CancellationToken, ValueTask> body)
{
    // Some magic here
    return Parallel.ForEachAsync(source, parallelOptions, body);
}

包裝器的行為應與 .NET 6 上的Parallel.ForEachAsync方法完全相同。

更新：這是我的場景的基本布局：

class Processor
{
    private readonly Channel<Item> _channel;
    private readonly Task _consumer;

    public Processor()
    {
        _channel = Channel.CreateUnbounded<Item>();
        _consumer = StartConsumer();
    }

    public int PendingItemsCount => _channel.Reader.Count;
    public Task Completion => _consumer;

    public void QueueItem(Item item) => _channel.Writer.TryWrite(item);

    private async Task StartConsumer()
    {
        ParallelOptions options = new() { MaxDegreeOfParallelism = 2 };
        await Parallel.ForEachAsync(_channel.Reader.ReadAllAsync(), options, async (item, _) =>
        {
            // Call async API
            // Persist the response of the API in an RDBMS
        });
    }
}

可能還有其他可用的工具也可用於此目的，但我更喜歡使用炙手可熱的 (.NET 6) Parallel.ForEachAsync API。 這是這個問題的重點。

Answer 1

我認為我已經找到了實現ForEachAsync_NoBuffering方法的方法。 這個想法是為底層的Parallel.ForEachAsync循環提供一個假的無限IEnumerable<TSource> ，並在body中對IAsyncEnumerable<TSource> source進行實際枚舉：

/// <summary>
/// Executes a for-each operation on an asynchronous sequence, in which iterations
/// may run in parallel. Items are taken from the source sequence one at a time,
/// and no intermediate storage is used.
/// </summary>
public static Task ForEachAsync_NoBuffering<TSource>(
    IAsyncEnumerable<TSource> source,
    ParallelOptions parallelOptions,
    Func<TSource, CancellationToken, ValueTask> body)
{
    ArgumentNullException.ThrowIfNull(source);
    ArgumentNullException.ThrowIfNull(parallelOptions);
    ArgumentNullException.ThrowIfNull(body);
    bool completed = false;
    IEnumerable<TSource> Infinite()
    {
        while (!Volatile.Read(ref completed)) yield return default;
    }
    SemaphoreSlim semaphore = new(1, 1);
    IAsyncEnumerator<TSource> enumerator = source.GetAsyncEnumerator();
    return Parallel.ForEachAsync(Infinite(), parallelOptions, async (_, ct) =>
    {
        // Take the next item in the sequence, after acquiring an exclusive lock.
        TSource item;
        await semaphore.WaitAsync(); // Continue on captured context.
        try
        {
            if (completed) return;
            if (!(await enumerator.MoveNextAsync())) // Continue on captured context.
            {
                completed = true; return;
            }
            item = enumerator.Current;
        }
        finally { semaphore.Release(); }
        // Invoke the body with the item that was taken.
        await body(item, ct).ConfigureAwait(false);
    }).ContinueWith(async t =>
    {
        // Dispose the enumerator.
        await enumerator.DisposeAsync().ConfigureAwait(false);
        semaphore.Dispose();
        return t;
    }, default, TaskContinuationOptions.DenyChildAttach |
        TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default)
        .Unwrap().Unwrap();
}

需要最后的ContinueWith來處理枚舉器，以及用於序列化枚舉器上的操作的SemaphoreSlim 。 ContinueWith與更簡單的await相比的優勢在於它傳播了並行循環的所有異常。

如何使用 NoBuffering 運行 Parallel.ForEachAsync 循環？

問題描述

1 個解決方案

解決方案1
0 已采納 2022-07-29 01:12:45

如何使用 NoBuffering 運行 Parallel.ForEachAsync 循環？

問題描述

1 個解決方案

解決方案1 0 已采納 2022-07-29 01:12:45

解決方案1
0 已采納 2022-07-29 01:12:45