简体   繁体   English

如何实现一个高效的 WhenEach 流式传输任务结果的 IAsyncEnumerable?

[英]How to implement an efficient WhenEach that streams an IAsyncEnumerable of task results?

I am trying to update my toolset with the new tools offered by C# 8 , and one method that seems particularly useful is a version of Task.WhenAll that returns an IAsyncEnumerable .我正在尝试使用C# 8提供的新工具更新我的工具集,其中一个似乎特别有用的方法是返回IAsyncEnumerable Task.WhenAll This method should stream the task results as soon as they become available, so naming it WhenAll doesn't make much sense.这种方法应该 stream 任务一旦可用就会产生结果,因此将其命名为WhenAll没有多大意义。 WhenEach sounds more appropriate. WhenEach听起来更合适。 The signature of the method is:该方法的签名是:

public static IAsyncEnumerable<TResult> WhenEach<TResult>(Task<TResult>[] tasks);

This method could be used like this:这种方法可以像这样使用:

var tasks = new Task<int>[]
{
    ProcessAsync(1, 300),
    ProcessAsync(2, 500),
    ProcessAsync(3, 400),
    ProcessAsync(4, 200),
    ProcessAsync(5, 100),
};

await foreach (int result in WhenEach(tasks))
{
    Console.WriteLine($"Processed: {result}");
}

static async Task<int> ProcessAsync(int result, int delay)
{
    await Task.Delay(delay);
    return result;
}

Expected output:预期 output:

Processed: 5已处理:5
Processed: 4已处理:4
Processed: 1已处理:1
Processed: 3已处理:3
Processed: 2已处理:2

I managed to write a basic implementation using the method Task.WhenAny in a loop, but there is a problem with this approach:我设法在循环中使用Task.WhenAny方法编写了一个基本实现,但是这种方法存在问题:

public static async IAsyncEnumerable<TResult> WhenEach<TResult>(
    Task<TResult>[] tasks)
{
    var hashSet = new HashSet<Task<TResult>>(tasks);
    while (hashSet.Count > 0)
    {
        var task = await Task.WhenAny(hashSet).ConfigureAwait(false);
        yield return await task.ConfigureAwait(false);
        hashSet.Remove(task);
    }
}

The problem is the performance.问题是性能。 The implementation of the Task.WhenAny creates a defensive copy of the supplied list of tasks, so calling it repeatedly in a loop results in O(n²) computational complexity. Task.WhenAny实现创建了所提供任务列表的防御性副本,因此在循环中重复调用它会导致 O(n²) 计算复杂度。 My naive implementation struggles to process 10,000 tasks.我幼稚的实现很难处理 10,000 个任务。 The overhead is nearly 10 sec in my machine.我的机器上的开销将近 10 秒。 I would like the method to be nearly as performant as the build-in Task.WhenAll , that can handle hundreds of thousands of tasks with ease.我希望该方法几乎与内置Task.WhenAll ,可以轻松处理数十万个任务。 How could I improve the WhenEach method to make it performs decently?如何改进WhenEach方法以使其表现得体?

By using code from this article, you can implement the following:通过使用本文中的代码,您可以实现以下功能:

public static Task<Task<T>>[] Interleaved<T>(IEnumerable<Task<T>> tasks)
{
   var inputTasks = tasks.ToList();

   var buckets = new TaskCompletionSource<Task<T>>[inputTasks.Count];
   var results = new Task<Task<T>>[buckets.Length];
   for (int i = 0; i < buckets.Length; i++)
   {
       buckets[i] = new TaskCompletionSource<Task<T>>();
       results[i] = buckets[i].Task;
   }

   int nextTaskIndex = -1;
   Action<Task<T>> continuation = completed =>
   {
       var bucket = buckets[Interlocked.Increment(ref nextTaskIndex)];
       bucket.TrySetResult(completed);
   };

   foreach (var inputTask in inputTasks)
       inputTask.ContinueWith(continuation, CancellationToken.None, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);

   return results;
}

Then change your WhenEach to call the Interleaved code然后更改您的WhenEach以调用Interleaved代码

public static async IAsyncEnumerable<TResult> WhenEach<TResult>(Task<TResult>[] tasks)
{
    foreach (var bucket in Interleaved(tasks))
    {
        var t = await bucket;
        yield return await t;
    }
}

Then you can call your WhenEach as per usual然后你可以像往常一样打电话给你的WhenEach

await foreach (int result in WhenEach(tasks))
{
    Console.WriteLine($"Processed: {result}");
}

I did some rudimentary benchmarking with 10k tasks and performed 5 times better in terms of speed.我对 10k 个任务进行了一些基本的基准测试,并在速度方面提高了 5 倍。

You can use a Channel as an async queue.您可以将 Channel 用作异步队列。 Each task can write to the channel when it completes.每个任务完成后都可以写入通道。 Items in the channel will be returned as an IAsyncEnumerable through ChannelReader.ReadAllAsync .通道中的项目将通过ChannelReader.ReadAllAsync作为 IAsyncEnumerable 返回。

IAsyncEnumerable<T> ToAsyncEnumerable<T>(IEnumerable<Task<T>> inputTasks)
{
    var channel=Channel.CreateUnbounded<T>();
    var writer=channel.Writer;
    var continuations=inputTasks.Select(t=>t.ContinueWith(x=>
                                           writer.TryWrite(x.Result)));
    _ = Task.WhenAll(continuations)
            .ContinueWith(t=>writer.Complete(t.Exception));

    return channel.Reader.ReadAllAsync();
}

When all tasks complete writer.Complete() is called to close the channel.当所有任务完成时,调用writer.Complete()以关闭通道。

To test this, this code produces tasks with decreasing delays.为了测试这一点,此代码生成具有递减延迟的任务。 This should return the indexes in reverse order:这应该以相反的顺序返回索引:

var tasks=Enumerable.Range(1,4)
                    .Select(async i=>
                    { 
                      await Task.Delay(300*(5-i));
                      return i;
                    });

await foreach(var i in Interleave(tasks))
{
     Console.WriteLine(i);

}

Produces:产生:

4
3
2
1

Just for the fun of it, using System.Reactive and System.Interactive.Async :只是为了好玩,使用System.ReactiveSystem.Interactive.Async

public static async IAsyncEnumerable<TResult> WhenEach<TResult>(
    Task<TResult>[] tasks)
    => Observable.Merge(tasks.Select(t => t.ToObservable())).ToAsyncEnumerable()

I really liked the solution provided by Panagiotis , but still wanted to get exceptions raised as they happen like in JohanP's solution.我真的很喜欢Panagiotis 提供的解决方案,但仍然希望引发异常,就像在 JohanP 的解决方案中一样。

To achieve that we can slightly modify that to try closing the channel in the continuations when a task fails:为了实现这一点,我们可以稍微修改一下,在任务失败时尝试关闭通道:

public IAsyncEnumerable<T> ToAsyncEnumerable<T>(IEnumerable<Task<T>> inputTasks)
{
    if (inputTasks == null)
    {
        throw new ArgumentNullException(nameof(inputTasks), "Task list must not be null.");
    }

    var channel = Channel.CreateUnbounded<T>();
    var channelWriter = channel.Writer;
    var inputTaskContinuations = inputTasks.Select(inputTask => inputTask.ContinueWith(completedInputTask =>
    {
        // Check whether the task succeeded or not
        if (completedInputTask.Status == TaskStatus.RanToCompletion)
        {
            // Write the result to the channel on successful completion
            channelWriter.TryWrite(completedInputTask.Result);
        }
        else
        {
            // Complete the channel on failure to immediately communicate the failure to the caller and prevent additional results from being returned
            var taskException = completedInputTask.Exception?.InnerException ?? completedInputTask?.Exception;
            channelWriter.TryComplete(taskException);
        }
    }));

    // Ensure the writer is closed after the tasks are all complete, and propagate any exceptions from the continuations
    _ = Task.WhenAll(inputTaskContinuations).ContinueWith(completedInputTaskContinuationsTask => channelWriter.TryComplete(completedInputTaskContinuationsTask.Exception));

    // Return the async enumerator of the channel so results are yielded to the caller as they're available
    return channel.Reader.ReadAllAsync();
}

The obvious downside to this is that the first error encountered will end enumeration and prevent any other, possibly successful, results from being returned.这样做的明显缺点是遇到的第一个错误将结束枚举并阻止返回任何其他可能成功的结果。 This is a tradeoff that's acceptable for my use case, but may not be for others.这是我的用例可以接受的权衡,但可能不适用于其他用例。

I am adding one more answer to this question, because there are a couple of issues that need to be addressed.我要为这个问题再添加一个答案,因为有几个问题需要解决。

  1. It is recommended that methods creating async-enumerable sequences should have a CancellationToken parameter.建议创建异步可枚举序列的方法应具有CancellationToken参数。 This enables the WithCancellation configuration in await foreach loops.这会在await foreach循环中启用WithCancellation配置。
  2. It is recommended that when an asynchronous operation attaches continuations to tasks, these continuations should be cleaned up when the operation completes.建议当异步操作将延续附加到任务时,应在操作完成时清理这些延续。 So if for example the caller of the WhenEach method decide to exit prematurely the await foreach loop (using break , return etc), or if the loop terminates prematurely because of an exception, we don't want to leave a bunch of dead continuations hanging around, attached to the tasks.因此,例如,如果WhenEach方法的调用者决定提前退出await foreach循环(使用breakreturn等),或者如果循环由于异常而提前终止,我们不想让一堆死延续挂起左右,执着于任务。 This can be particularly important if the WhenEach is called repeatedly in a loop (as part of a Retry functionality for example).如果在循环中重复调用WhenEach (例如,作为Retry功能的一部分),这一点尤其重要。

The implementation below addresses these two issues.下面的实现解决了这两个问题。 It is based on a Channel<Task<TResult>> .它基于Channel<Task<TResult>> Now the channels have become an integral part of the .NET platform, so there is no reason to avoid them in favor of more complex TaskCompletionSource -based solutions.现在通道已成为 .NET 平台不可或缺的一部分,因此没有理由避免使用基于更复杂TaskCompletionSource的解决方案。

public async static IAsyncEnumerable<TResult> WhenEach<TResult>(
    Task<TResult>[] tasks,
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    var channel = Channel.CreateUnbounded<Task<TResult>>();
    using var linkedCts = CancellationTokenSource
        .CreateLinkedTokenSource(cancellationToken);
    var continuations = new List<Task>(tasks.Length);

    try
    {
        int pendingCount = tasks.Length;
        foreach (var task in tasks)
        {
            if (task == null) throw new ArgumentException(
                $"The tasks argument included a null value.", nameof(tasks));
            continuations.Add(task.ContinueWith(t =>
            {
                var accepted = channel.Writer.TryWrite(t);
                Debug.Assert(accepted);
                if (Interlocked.Decrement(ref pendingCount) == 0)
                    channel.Writer.Complete();
            }, linkedCts.Token, TaskContinuationOptions.ExecuteSynchronously |
                TaskContinuationOptions.DenyChildAttach, TaskScheduler.Default));
        }

        await foreach (var task in channel.Reader.ReadAllAsync(cancellationToken)
            .ConfigureAwait(false))
        {
            yield return await task.ConfigureAwait(false);
            cancellationToken.ThrowIfCancellationRequested();
        }
    }
    finally
    {
        linkedCts.Cancel();
        try { await Task.WhenAll(continuations).ConfigureAwait(false); }
        catch (OperationCanceledException) { } // Ignore
    }
}

The finally block takes care of cancelling the attached continuations, and awaiting them to complete before exiting. finally块负责取消附加的延续,并在退出之前等待它们完成。

The ThrowIfCancellationRequested inside the await foreach loop might seem redundant, but it is actually required because of a by-design behavior of the ReadAllAsync method, that is explained here . await foreach循环中的ThrowIfCancellationRequested可能看起来多余,但实际上是必需的,因为ReadAllAsync方法的设计行为,这在此处进行了解释。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM