简体   繁体   English

Parallel.ForEach 中的多个异步等待链接

[英]Multiple async-await chaining inside Parallel.ForEach

I have a Parallel.ForEach loop which loops through a collection.我有一个 Parallel.ForEach 循环,它遍历一个集合。 Inside, the loop I make multiple network I/O calls.在循环内部,我进行了多次网络 I/O 调用。 I used Task.ContinueWith and nested the subsequent async-await calls.我使用了 Task.ContinueWith 并嵌套了随后的 async-await 调用。 The order of the processing doesn't matter, but the data from each async calls should be processed in a synchronized way.处理的顺序无关紧要,但是每个异步调用的数据应该以同步的方式处理。 Meaning- For each iteration, the data retrieved from the first async call should get passed to the second async call.含义 - 对于每次迭代,从第一个异步调用检索到的数据应该传递给第二个异步调用。 After the second async call finishes, the data from both the async call should be processed together.在第二个异步调用完成后,来自两个异步调用的数据应该一起处理。

Parallel.ForEach(someCollection, parallelOptions, async (item, state) =>
{
    Task<Country> countryTask = Task.Run(() => GetCountry(item.ID));

    //this is my first async call
    await countryTask.ContinueWith((countryData) =>
    {
        countries.Add(countryData.Result);

        Task<State> stateTask = Task.Run(() => GetState(countryData.Result.CountryID));

        //based on the data I receive in 'stateTask', I make another async call
        stateTask.ContinueWith((stateData) =>
        {
            states.Add(stateData.Result);

            // use data from both the async calls pass it to below function for some calculation
            // in a synchronized way (for a country, its corresponding state should be passed)

            myCollection.ConcurrentAddRange(SomeCalculation(countryData.Result, stateData.Result));
        });
    });
});

I tried the above without using continue await but it was not working in synchronized way.我在没有使用 continue await 的情况下尝试了上述方法,但它没有以同步方式工作。 Now, the above code executes to completion but no records gets processed.现在,上面的代码执行完成,但没有记录被处理。

Any help with this please?请问有什么帮助吗? Let me know if I can add more details.让我知道是否可以添加更多详细信息。

As your methods involve I/O, they should be written to be truly asynchronous, not just synchronously ran on the threadpool using Task.Run .由于您的方法涉及 I/O,因此它们应该被编写为真正异步的,而不仅仅是使用Task.Run在线程池上同步运行。

Then you could use Task.WhenAll in combination with Enumerable.Select :然后您可以将Task.WhenAllEnumerable.Select结合使用:

var tasks = someCollection.Select(async item =>
{
    var country = await GetCountryAsync(item.Id);
    var state = await GetStateAsync(country.CountryID);
    var calculation = SomeCalculation(country, state);

    return (country, state, calculation);
});

foreach (var tuple in await Task.WhenAll(tasks))
{
    countries.Add(tuple.country);
    states.Add(tuple.state);
    myCollection.AddRange(tuple.calculation);
}

This would ensure that each country > state > calculation occurs sequentially, but each item is processed concurrently, and asynchronously.这将确保每个country /地区 > state > calculation顺序发生,但每个item都是同时异步处理的。


Update as per comment根据评论更新

using var semaphore = new SemaphoreSlim(2);
using var cts = new CancellationTokenSource();

int failures = 0;

var tasks = someCollection.Select(async item =>
{
    await semaphore.WaitAsync(cts.Token);
    
    try
    {
        var country = await GetCountryAsync(item.Id);
        var state = await GetStateAsync(country.CountryID);
        var calculation = SomeCalculation(country, state);

        Interlocked.Exchange(ref failures, 0);

        return (country, state, calculation);
    {
    catch
    {
        if (Interlocked.Increment(ref failures) >= 10)
        {
            cts.Cancel();
        }
        throw;
    }
    finally
    {
        semaphore.Release();
    }
});

The semaphore ensures a maximum of 2 concurrent async operations, and the cancellation token will cancel all outstanding tasks after 10 consecutive exceptions.信号量保证最多 2 个并发异步操作,取消令牌将在连续 10 次异常后取消所有未完成的任务。

The Interlocked methods ensures that failures is accessed in a thread-safe manner. Interlocked方法确保以线程安全的方式访问failures


Further Update进一步更新

It may be even more efficient to use 2 semaphores to prevent multiple iterations.使用 2 个信号量来防止多次迭代可能更有效。

Encapsulate all the list-adding into a single method:将所有列表添加封装到一个方法中:

void AddToLists(Country country, State state, Calculation calculation)
{
    countries.Add(country);
    states.Add(state);
    myCollection.AddRange(calculation);
}

Then you could allow 2 threads to simultaneously serve the Http requests, and 1 to perform the adds, making that operation thread-safe:然后,您可以允许 2 个线程同时服务 Http 请求,并允许 1 个线程执行添加,使该操作线程安全:

using var httpSemaphore = new SemaphoreSlim(2);
using var listAddSemaphore = new SemaphoreSlim(1);
using var cts = new CancellationTokenSource();

int failures = 0;

await Task.WhenAll(someCollection.Select(async item =>
{
    await httpSemaphore.WaitAsync(cts.Token);
    
    try
    {
        var country = await GetCountryAsync(item.Id);
        var state = await GetStateAsync(country.CountryID);
        var calculation = SomeCalculation(country, state);

        await listAddSemaphore.WaitAsync(cts.Token);
        AddToLists(country, state, calculation);

        Interlocked.Exchange(ref failures, 0);
    {
    catch
    {
        if (Interlocked.Increment(ref failures) >= 10)
        {
            cts.Cancel();
        }
        throw;
    }
    finally
    {
        httpSemaphore.Release();
        listAddSemaphore.Release();
    }
}));

I think you're over-complicating this;我认为你过于复杂了; inside the Parallel.ForEach , you're already on the thread pool , so there is really no benefit creating lots of additional tasks inside.Parallel.ForEach内部,您已经在线程池中,因此在内部创建大量额外任务确实没有任何好处。 So;所以; how to do this really depends on whether GetState etc are synchronous or asynchronous.如何做到这一点实际上取决于GetState等是同步的还是异步的。 If we assume synchronous, then something like:如果我们假设同步,那么类似:

Parallel.ForEach(someCollection, parallelOptions, (item, _) =>
{
    var country = GetCountry(item.Id);

    countries.Add(country); // warning: may need to synchronize

    var state = GetState(country.CountryID);

    states.Add(state); // warning: may need to synchronize

    // use data from both the async calls pass it to below function for some calculation
    // in a synchronized way (for a country, its corresponding state should be passed)
    myCollection.ConcurrentAddRange(SomeCalculation(country, state));
});

If they are async, it gets more awkward;如果它们是异步的,那就更尴尬了; it would be nice if we could do something like:如果我们能做这样的事情会很好

// WARNING: DANGEROUS CODE - DO NOT COPY
Parallel.ForEach(someCollection, parallelOptions, async (item, _) =>
{
    var country = await GetCountryAsync(item.Id);

    countries.Add(country); // warning: may need to synchronize

    var state = await GetStateAsync(country.CountryID);

    states.Add(state); // warning: may need to synchronize

    // use data from both the async calls pass it to below function for some calculation
    // in a synchronized way (for a country, its corresponding state should be passed)
    myCollection.ConcurrentAddRange(SomeCalculation(country, state));
});

but the problem here is that none of the callbacks in Parallel.ForEach are "awaitable", meaning: we have silently created an async void callback here, which is: very bad.但这里的问题是Parallel.ForEach中的回调都不是“可等待的”,这意味着:我们在这里默默地创建了一个async void回调,即:非常糟糕。 This means that Parallel.ForEach will think it has "finished" as soon as the non-complete await happens, which means:这意味着一旦未完成的await发生, Parallel.ForEach就会认为它已经“完成”,这意味着:

  1. we have no clue when all the work has actually finished我们不知道所有工作何时真正完成
  2. you could be doing a lot more concurrently than you intended (max-dop can not be respected)您可能会比您预期的同时做更多的事情(不能尊重 max-dop)

There doesn't seem to be any good API to avoid this currently.目前似乎没有任何好的 API 可以避免这种情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM