简体   繁体   中英

How to merge multiple asynchronous sequences without left-side bias?

I have a few AsyncEnumerable<string> s that I would like to merge in a single AsyncEnumerable<string> , which should contain all the elements that are emitted concurrently from those sequences. So I used the Merge operator from the System.Interactive.Async package. The problem is that this operator does not always treat all sequences as equal. In some circumstances it prefers emitting elements from the sequences that are on the left side of the arguments list, and neglects the sequences that are on the right side in the arguments list. Here is a minimal example that reproduces this undesirable behavior:

var sequence_A = Enumerable.Range(1, 5).Select(i => $"A{i}").ToAsyncEnumerable();
var sequence_B = Enumerable.Range(1, 5).Select(i => $"B{i}").ToAsyncEnumerable();
var sequence_C = Enumerable.Range(1, 5).Select(i => $"C{i}").ToAsyncEnumerable();
var merged = AsyncEnumerableEx.Merge(sequence_A, sequence_B, sequence_C);
await foreach (var item in merged) Console.WriteLine(item);

This code snippet has also a dependency on the System.Linq.Async package. The sequence_A emits 5 elements starting from "A" , the sequence_B emits 5 elements starting from "B" , and the sequence_C emits 5 elements starting from "C" .

Output (undesirable):

A1
A2
A3
A4
A5
B1
B2
B3
B4
B5
C1
C2
C3
C4
C5

Try it on Fiddle .

The desirable output should look like this:

A1
B1
C1
A2
B2
C2
A3
B3
C3
A4
B4
C4
A5
B5
C5

In case all sequences have their next element available, the merged sequence should pull one element from each sequence, instead of pulling elements repeatedly from the left-most sequence.

How can I ensure that my sequences are merged with fairness? I am looking for a combination of operators from the official packages that has the desirable behavior, or for a custom Merge operator that does what I want.

Clarification: I am interested about the concurrent Merge functionality, where all source sequences are observed at the same time, and any emission from any of the sequences is propagated to the merged sequence. The concept of fairness applies when more than one sequences can emit an element immediately, in which case their emissions should be interleaved. In the opposite case, when there is no element immediately available, the rule is "first to come - first to go".


Update: Here is a more realistic demo, that includes latency in the producer sequences, and in the consuming enumeration loop. It simulates a situation where consuming the values produced by the left-most sequence takes longer than the time required for producing those values.

var sequence_A = Produce("A", 200, 1, 2, 3, 4, 5);
var sequence_B = Produce("B", 150, 1, 2, 3, 4, 5);
var sequence_C = Produce("C", 100, 1, 2, 3, 4, 5);
var merged = AsyncEnumerableEx.Merge(sequence_A, sequence_B, sequence_C);
await foreach (var item in merged)
{
    Console.WriteLine(item);
    await Task.Delay(item.StartsWith("A") ? 300 : 50); // Latency
}

async IAsyncEnumerable<string> Produce(string prefix, int delay, params int[] values)
{
    foreach (var value in values)
    {
        var delayTask = Task.Delay(delay);
        yield return $"{prefix}{value}";
        await delayTask; // Latency
    }
}

The result is an undesirable bias for the values produced by the sequence_A :

A1
A2
A3
A4
A5
B1
B2
C1
B3
C2
B4
C3
C4
B5
C5

Try it on Fiddle .

Here is the final code. The algorithm has been modified to suit the OP. I have left the original code below.

This use a greedy algorithm: the first available value is returned, and no attempt is made to merge in turn. Each time a task finishes, the next one for the same enumerator goes to the back, ensuring fairness.

The algorithm is as follows:

  • The function accepts a params array of sources.
  • Early bail-out if no source enumerables are provided.
  • Create a list to hold the enumerators along with their respective tasks as tuples.
  • Get each enumerator, call MoveNextAsync and store the pair in the list.
  • In a loop, call Task.WhenAny on the whole list.
  • Take the resulting Task and find its location in the list.
  • Hold the tuple in a variable and remove it from the list.
  • If it returned true , then yield the value and call MoveNextAsync again for the matching enumerator, pushing the resulting tuple to the back of the list.
  • If it returns false , then Dispose the enumerator.
  • Continue looping until the list is empty.
  • finally block disposes any remaining enumerators.
  • There is also an overload to provide a cancellation token

There are some efficiencies to be had in terms of allocations etc. I've left that as an exercise to the reader.


 public static IAsyncEnumerable<T> Interleave<T>(params IAsyncEnumerable<T>[] sources) =>
     Interleave(default, sources);
 
 public static async IAsyncEnumerable<T> Interleave<T>([EnumeratorCancellation] CancellationToken token, IAsyncEnumerable<T>[] sources)
 {
     if(sources.Length == 0)
         yield break;
     var enumerators = new List<(IAsyncEnumerator<T> e, Task<bool> t)>(sources.Length);
     try
     {
         for(var i = 0; i < sources.Length; i++)
         {
             var e = sources[i].GetAsyncEnumerator(token);
             enumerators.Add((e, e.MoveNextAsync().AsTask()));
         }

         do
         {
             var taskResult = await Task.WhenAny(enumerators.Select(tuple => tuple.t));
             var ind = enumerators.FindIndex(tuple => tuple.t == taskResult);
             var tuple = enumerators[ind];
             enumerators.RemoveAt(ind);
             if(taskResult.Result)
             {
                 yield return tuple.e.Current;
                 enumerators.Add((tuple.e, tuple.e.MoveNextAsync().AsTask()));
             }
             else
             {
                 try
                 {
                     await tuple.e.DisposeAsync();
                 }
                 catch
                 { //
                 }
             }
         } while (enumerators.Count > 0);
     }
     finally
     {
         for(var i = 0; i < enumerators.Count; i++)
         {
             try
             {
                 await enumerators[i].e.DisposeAsync();
             }
             catch
             { //
             }
         }
     }
 }

dotnetfiddle


EDIT The below isn't quite what OP wanted, as OP wants any result to be returned, whichever first. I'll leave this here because it's a good demonstration of this algorithm.

Here is a full implementation of the async Interleave or Merge algorithm , known more commonly in SQL terms as a Merge-Concatenation .

The algorithm is as follows:

  • The function accepts a params array of sources.
  • Early bail-out if no source enumerables are provided.
  • Create a list to hold the enumerators.
  • Get each enumerator and store it in the list.
  • In a loop, take each enumerator and MoveNextAsync .
  • If it returns true , then yield the value and increment the loop counter. If it rolls over, go back to the beginning.
  • If it returns false , then Dispose it and remove from the list. Do not increment counter.
  • Continue looping until there are no more enumerators.
  • finally block disposes any remaining enumerators.
  • There is also an overload to provide a cancellation token

 public static IAsyncEnumerable<T> Interleave<T>(params IAsyncEnumerable<T>[] sources) =>
     Interleave(default, sources);
 
 public static async IAsyncEnumerable<T> Interleave<T>([EnumeratorCancellation] CancellationToken token, IAsyncEnumerable<T>[] sources)
 {
     if(sources.Length == 0)
         yield break;
     var enumerators = new List<IAsyncEnumerator<T>>(sources.Length);
     try
     {
         for(var i = 0; i < sources.Length; i++)
             enumerators.Add(sources[i].GetAsyncEnumerator(token));

         var j = 0;
         do
         {
             if(await enumerators[j].MoveNextAsync())
             {
                 yield return enumerators[j].Current;
                 j++;
                 if(j >= enumerators.Count)
                     j = 0;
             }
             else
             {
                 try
                 {
                     await enumerators[j].DisposeAsync();
                 }
                 catch
                 { //
                 }
                 enumerators.RemoveAt(j);
             }
         } while (enumerators.Count > 0);
     }
     finally
     {
         for(var i = 0; i < enumerators.Count; i++)
         {
             try
             {
                 await enumerators[i].DisposeAsync();
             }
             catch
             { //
             }
         }
     }
 }

dotnetfiddle

This can obviously be significantly simplified if you only have a fixed number of source enumerables.

The example is a bit contrived as all results are available immediately. If even a small delay is added, the results are mixed:

var sequence_A = AsyncEnumerable.Range(1, 5)
    .SelectAwait(async i =>{ await Task.Delay(i); return $"A{i}";});
var sequence_B = AsyncEnumerable.Range(1, 5)
    .SelectAwait(async i =>{ await Task.Delay(i); return $"B{i}";});
var sequence_C = AsyncEnumerable.Range(1, 5)
    .SelectAwait(async i =>{ await Task.Delay(i); return $"C{i}";});
var sequence_D = AsyncEnumerable.Range(1, 5)
    .SelectAwait(async i =>{ await Task.Delay(i); return $"D{i}";});

await foreach (var item in seq) Console.WriteLine(item);

This produces different, mixed results each time:

B1
A1
C1
D1
D2
A2
B2
C2
D3
A3
B3
C3
C4
A4
B4
D4
D5
A5
B5
C5

The method's comments explain it was reimplemented to be cheaper and fairer:

//
// This new implementation of Merge differs from the original one in a few ways:
//
// - It's cheaper because:
//   - no conversion from ValueTask<bool> to Task<bool> takes place using AsTask,
//   - we don't instantiate Task.WhenAny tasks for each iteration.
// - It's fairer because:
//   - the MoveNextAsync tasks are awaited concurently, but completions are queued,
//     instead of awaiting a new WhenAny task where "left" sources have preferential
//     treatment over "right" sources.
//

I am posting one more answer, because I noticed some other minor defects in the current¹ AsyncEnumerableEx.Merge implementation that I would like to fix:

  1. Left-side bias. This is the main issue of this question, and has already been addressed sufficiently in Charlieface's answer . In this answer I am using the same interleaving technique.
  2. Desctructive merging. In case one of the source sequences fails, the merged sequence is likely to complete without propagating all the values that are produced by the other source sequences. This makes the current AsyncEnumerableEx.Merge implementation not very suitable for producer-consumer scenarios, where processing all the consumed elements is mandatory.
  3. Delayed completion. In case one of the sequences fails, or the enumeration of the merged sequence is abandoned, the disposal of the merged enumerator might take a lot of time, because the pending MoveNextAsync operations of the source enumerators are not canceled.
  4. Throwing on dispose. It is generally recommended that disposable resources should avoid throwing on Dispose or on DisposeAsync . Nevertheless the AsyncEnumerableEx.Merge implementation propagates normal operational errors (errors thrown by the MoveNextAsync ) from the finally block.

The MergeEx implementation below is an attempt to fix these problems. It is a concurrent and non-destructive implementation, that propagates all the consumed values. All the errors that are caught are preserved, and are propagated in an AggregateException .

/// <summary>
/// Merges elements from all source sequences, into a single interleaved sequence.
/// </summary>
public static IAsyncEnumerable<TSource> MergeEx<TSource>(
    params IAsyncEnumerable<TSource>[] sources)
{
    ArgumentNullException.ThrowIfNull(sources);
    sources = sources.ToArray(); // Defensive copy.
    if (sources.Any(s => s is null)) throw new ArgumentException(
        $"The {nameof(sources)} argument included a null value.", nameof(sources));
    return Implementation();

    async IAsyncEnumerable<TSource> Implementation(
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        if (sources.Length == 0) yield break;
        cancellationToken.ThrowIfCancellationRequested();
        using var linkedCts = CancellationTokenSource
            .CreateLinkedTokenSource(cancellationToken);
        List<(IAsyncEnumerator<TSource>, Task<bool> MoveNext)> state = new();
        List<Exception> errors = new();

        try
        {
            // Create enumerators and initial MoveNextAsync tasks.
            foreach (var source in sources)
            {
                IAsyncEnumerator<TSource> enumerator;
                Task<bool> moveNext;
                try { enumerator = source.GetAsyncEnumerator(linkedCts.Token); }
                catch (Exception ex) { errors.Add(ex); break; }
                try { moveNext = enumerator.MoveNextAsync().AsTask(); }
                catch (Exception ex) { moveNext = Task.FromException<bool>(ex); }
                state.Add((enumerator, moveNext));
            }

            bool cancelationOccurred = false;

            // Loop until all enumerators are completed.
            while (state.Count > 0)
            {
                int completedIndex = -1;
                for (int i = 0; i < state.Count; i++)
                {
                    var status = state[i].MoveNext.Status;
                    if (status == TaskStatus.Faulted || status == TaskStatus.Canceled)
                    {
                        // Handle errors with priority.
                        completedIndex = i;
                        break;
                    }
                    else if (status == TaskStatus.RanToCompletion)
                    {
                        // Handle completion in order.
                        if (completedIndex == -1) completedIndex = i;
                        continue;
                    }
                }

                if (completedIndex == -1)
                {
                    // All MoveNextAsync tasks are currently in-flight.
                    await Task.WhenAny(state.Select(e => e.MoveNext))
                        .ConfigureAwait(false);
                    continue;
                }

                var (enumerator, moveNext) = state[completedIndex];
                Debug.Assert(moveNext.IsCompleted);
                (TSource Value, bool HasValue) item;
                try
                {
                    bool moved = await moveNext.ConfigureAwait(false);
                    item = moved ? (enumerator.Current, true) : default;
                }
                catch (OperationCanceledException)
                    when (linkedCts.IsCancellationRequested)
                {
                    // Cancellation from the linked token source is not an error.
                    item = default; cancelationOccurred = true;
                }
                catch (Exception ex)
                {
                    errors.Add(ex); linkedCts.Cancel();
                    item = default;
                }

                if (item.HasValue)
                    yield return item.Value;

                if (item.HasValue && errors.Count == 0)
                {
                    try { moveNext = enumerator.MoveNextAsync().AsTask(); }
                    catch (Exception ex) { moveNext = Task.FromException<bool>(ex); }
                    // Deprioritize the selected enumerator.
                    state.RemoveAt(completedIndex);
                    state.Add((enumerator, moveNext));
                }
                else
                {
                    // The selected enumerator has completed or an error has occurred.
                    state.RemoveAt(completedIndex);
                    try { await enumerator.DisposeAsync().ConfigureAwait(false); }
                    catch (Exception ex) { errors.Add(ex); linkedCts.Cancel(); }
                }
            }

            if (errors.Count > 0)
                throw new AggregateException(errors);

            // Propagate cancellation only if it occurred during the loop.
            if (cancelationOccurred)
                cancellationToken.ThrowIfCancellationRequested();
        }
        finally
        {
            // The finally runs when an enumerator created by this method is disposed.
            // Cancel any active enumerators, for more responsive completion.
            // Prevent fire-and-forget, otherwise the DisposeAsync() might throw.
            // Suppress MoveNextAsync errors, but propagate DisposeAsync errors.
            errors.Clear();
            try { linkedCts.Cancel(); }
            catch (Exception ex) { errors.Add(ex); }
            foreach (var (enumerator, moveNext) in state)
            {
                if (!moveNext.IsCompleted)
                {
                    try { await moveNext.ConfigureAwait(false); } catch { }
                }
                try { await enumerator.DisposeAsync().ConfigureAwait(false); }
                catch (Exception ex) { errors.Add(ex); }
            }
            if (errors.Count > 0)
                throw new AggregateException(errors);
        }
    }
}

¹ System.Interactive.Async version 6.0.1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM