简体   繁体   中英

How GroupBy works with IAsyncEnumerable?

I've implemented IAsyncEnumerable to my HttpClient requests where there is a pagination but I also need to GroupBy them. So I've implemented code like below;

public class Item 
{
   public int Id {get; set;}
   public string Name {get; set;}
}

public async IAsyncEnumerable<Item> GetItems()
{
   while(hasMorePage)
   {
       // ... get paginated items

       foreach(var item in paginatedItems)
       {
         yield return item;
       }
   }
}

// should find most repeated item(by Id) and count of them.
public async Task GroupItems()
{
  IAsyncEnumerable<Item> items = GetItems();
  
  //IAsyncGrouping
  await foreach(var item in items.GroupBy(i => i.Id).
                                  OrderByDescendingAwait(i => i.CountAsync()).
                                  Take(10))
 {
    Console.WriteLine(item.Key.ToString() + (await item.CountAsync()).ToString());
 }
}

This code works perfectly fine as I expected. But I would like to understand how GroupBy works here, because of it should have all items to group by id is there something that I miss? or is there anything I can refactor for performance?

First of all, the ALinq repo linked in the comments has nothing to do with .NET's IAsyncEnumerable or System.Linq.Async. It's an 8 year old repo that doesn't even target .NET Core. System.Linq.Async is maintained by the same team that built Reactive Excetions for .NET and its code is in the same Github repository

Second, it's unclear what behavior needs to be explained.

  • Does GroupBy block? No it doesn't.
  • Does GroupBy have to consume the entire source before producing a results? Yes it does.

If you have a long running stream of events you'll have to wait until the stream ends to get any results. That's because GroupBy calculates the groupings in its allocation phase , then returns them in its iteration phase

protected override async ValueTask<bool> MoveNextCore()
{
    switch (_state)
    {
        case AsyncIteratorState.Allocated:
            _lookup = await Internal.Lookup<TKey, TSource>.CreateAsync(_source, _keySelector, _comparer, _cancellationToken).ConfigureAwait(false);
            _enumerator = _lookup.ApplyResultSelector(_resultSelector).GetEnumerator();
            _state = AsyncIteratorState.Iterating;
            goto case AsyncIteratorState.Iterating;

        case AsyncIteratorState.Iterating:
            if (_enumerator!.MoveNext())
            {
                _current = _enumerator.Current;
                return true;
            }

            await DisposeAsync().ConfigureAwait(false);
            break;
    }

    return false;
}

If you want to process streams of events you should look at Rx.NET, which was built by the same team that created System.Linq.Async. In Rx.NET GroupBy will emit a new group stream when a new key value is encountered:

反应式扩展 GroupBy 插图

Notice that Rx.NET's GroupBy actually partitions the event stream by the grouping key and emits streams not groupings. Subscribers will subscribe to those streams and process their events. This Aggregation example demonstrates this:

var source = Observable.Interval(TimeSpan.FromSeconds(0.1)).Take(10);
var group = source.GroupBy(i => i % 3);
group.Subscribe(
  grp => 
    grp.Min().Subscribe(
      minValue => 
        Console.WriteLine("{0} min value = {1}", grp.Key, minValue)),
  () => Console.WriteLine("Completed"));

If you need to process a long-running IAsyncEnumerable<> stream you can use ToObservable

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM