简体   繁体   English

遍历异步方法的最佳方法是什么?

[英]What is the best way to loop over async method?

I'm wondering what's the best way to loop over async method. 我想知道循环异步方法的最佳方法是什么。 Let's say I have a method: 假设我有一个方法:

public async Task<bool> DownloadThenWriteThenReturnResult(string id)
{
    // async/await stuff....
}

I want to call this method 10 000 times assuming I already have a 10 000 strings list for parameters called "_myStrings". 假设已经有一个名为“ _myStrings”的参数的字符串列表,我想将该方法调用1万次。 I want 4 threads maximum to share this work (In production I'd use ProcessorCount - 1). 我希望最多4个线程来共享此工作(在生产中,我将使用ProcessorCount-1)。 I want to be able to cancel everything. 我希望能够取消一切。 And finally I want the result of each calls. 最后,我需要每次通话的结果。 I'd like to know what is the difference and what is the best way and why between: 我想知道两者之间有什么区别,最好的方法是什么,为什么?

*1 - * 1-

var allTasks = _myStrings.Select(st =>DownloadThenWriteThenReturnResult(st));
bool[] syncSuccs = await Task.WhenAll(syncTasks);

*2 - * 2-

await Task.Run(() =>
{
    var result = new ConcurrentQueue<V>();
    var po = new ParallelOptions(){MaxDegreeOfParallelism = 4};
    Parallel.ForEach(_myStrings, po, (st) =>
    {
        result.Enqueue(DownloadThenWriteThenReturnResult(st).Result);
        po.CancellationToken.ThrowIfCancellationRequested();
    });
});

*3 - * 3-

using (SemaphoreSlim throttler = new SemaphoreSlim(initialCount: 4))
{
    var results = new List<bool>();
    var allTasks = new List<Task>();
    foreach (var st in _myStrings)
    {
        await throttler.WaitAsync();
        allTasks.Add(Task.Run(async () =>
        {
            try
            {
                results.Add(await DownloadThenWriteThenReturnResult(st));
            }
            finally
            {
                throttler.Release();
            }
        }));
    }
    await Task.WhenAll(allTasks);
}

*4 - * 4-

var block = new TransformBlock<string, bool>(
async st =>
{
    return await DownloadThenWriteThenReturnResult(st);
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4});

foreach (var st in _myStrings)
{
    await block.SendAsync(st);
}

var results = new List<bool>();
foreach (var st in _myStrings)
{
    results.Add(await block.ReceiveAsync());
}

Is there another way? 还有另一种方法吗? These 4 gave me similar results whereas only *2,*3 and *4 use 4 threads. 这4个给我相似的结果,而只有* 2,* 3和* 4使用4个线程。 Can you confirm that: 您能否确认:

  • *1 creates 10 000 tasks on the threadpool thread but will be executed in only one thread * 1在线程池线程上创建10000个任务,但仅在一个线程中执行

  • *2 will create 4 threads T1 T2 T3 and T4. * 2将创建4个线程T1 T2 T3和T4。 It uses .Result thus it is not async all the way (shall I avoid that here?). 它使用.Result,因此并非一直都是异步的(我在这里避免吗?)。 Since DownloadThenWriteThenReturnResult is executed in one of the 4 threads T1 T2 T3 or T4, where are the nested tasks placed (by nested tasks I mean what every async methods will return when awaited)? 由于DownloadThenWriteThenReturnResult是在4个线程T1 T2 T3或T4中之一执行的,因此将嵌套任务放在哪里(通过嵌套任务,我的意思是每个异步方法在等待时将返回什么)? In a dedicated threadpool thread (let's says T11 T21 T31 and T41)? 在专用线程池线程中(假设T11,T21,T31和T41)?

  • Same question for *3 and *4 * 3和* 4的相同问题

*4 seems to be my best shot. * 4似乎是我最好的镜头。 It's easy to understand what's going on and I'll be able to create new blocks and link them if needed. 很容易理解正在发生的事情,我将能够创建新的块并在需要时链接它们。 It also seems completely async. 它似乎也完全不同步。 But I'd like to understand where the nested tasks from all my Async/Await code within DownLoadThenWriteThenReturnResult are executed and if it's the best way to do so. 但是我想了解在DownLoadThenWriteThenReturnResult中所有Async / Await代码中嵌套的任务的执行位置,这是最好的方法。

Thanks for any hints! 感谢您的提示!

I will try to answer all your questions. 我会尽力回答您的所有问题。

My proposal 我的建议

First this is what I would do. 首先,这就是我要做的。 I tried to minimize the number of task and to keep the code simple. 我试图最大程度地减少任务数量,并使代码保持简单。

Your problem looks like some kind of producer/consumer case. 您的问题看起来像某种生产者/消费者案例。 I would go with something simple like that: 我会喜欢这样简单的东西:

public async Task Work(ConcurrentQueue<string> input, ConcurrentQueue<bool> output)
{
    string current;
    while (input.TryDequeue(out current))
    {
        output.Enqueue(await DownloadThenWriteThenReturnResult(current));
    }
}

var nbThread = 4;
var input = new ConcurrentQueue<string>(_myStrings);
var output = new ConcurrentQueue<bool>();

var workers = new List<Task>(nbThread);

for (int i = 0; i < nbThread; i++)
{
    workers.Add(Task.Run(async () => await this.Work(input, output)));
}

await Task.WhenAll(workers);

I am not sure the number of thread is correlated to the number of processor. 我不确定线程​​数量是否与处理器数量相关。 This would be true if you were dealing with CPU-Bound operations. 如果您正在处理CPU绑定操作,这将是正确的。 In such cases, you should run as synchronous as possible because the overload introduced by the system to switch from one context to another is heavy. 在这种情况下,您应该尽可能地同步运行,因为系统引入的从一个上下文切换到另一个上下文的过载很重。 So in that cases, one operation by thread, is the way. 因此,在这种情况下,通过线程进行操作是一种方法。

But in your case, since you are most of the time waiting for I/O (network for the http call, disk for the write, etc), you could probably start more tasks in parallel. 但是在您的情况下,由于您大部分时间都在等待I / O(用于HTTP调用的网络,用于写入的磁盘等),您可能可以并行启动更多任务。 Each time a task is waiting for an I/O, the system can paused it and switch to another task. 每次任务等待I / O时,系统都可以暂停它并切换到另一个任务。 The overload here is not wasted because the thread would be waiting doing nothing on the other hand. 另一方面,这里的重载并没有浪费,因为另一方面线程将等待任何操作。

You should benchmark with 4, 5, 6, etc tasks and find which one is the more efficient. 您应该使用4、5、6等任务进行基准测试,并找出哪一项效率更高。

One issue I could see here is that you don't know which input produced which ouput. 我在这里可以看到的一个问题是,您不知道哪个输入产生了哪个输出。 You could use a ConcurrentDictionary instead of two ConcurrentQueue but there can't be duplicate in _myStrings . 您可以使用ConcurrentDictionary而不是两个ConcurrentQueue但是_myStrings不能重复。

Your solutions 您的解决方案

Here is what I thought about your solutions. 这是我对您的解决方案的看法。

Solution *1 解决方案* 1

As you said, it is going to create 10 000 tasks. 如您所说,它将创建10,000个任务。 As far as I know (but I am not an expert on that field), the system will share the ThreadPool threads among the tasks, applying some Round Robin algorithm. 据我所知(但我不是该领域的专家),系统将使用一些Round Robin算法在任务之间共享ThreadPool线程。 I think the same task can even start its execution on a first thread, be paused by the system, and finish its execution on a second thread. 我认为同一任务甚至可以在第一个线程上开始执行,被系统暂停,并在第二个线程上完成执行。 This will introduce more overhead than necessary and cause the overall runtimes to be slower. 这将引入不必要的更多开销,并使整体运行时间变慢。

I think this must absolutely be avoided! 我认为必须绝对避免!

solution *2 解决方案* 2

I read that the Parallel API does not work well with asynchronous operations. 我读到Parallel API不适用于异步操作。 I also read plenty of times that you don't want to call .Result on a task unless absolute need. 我也阅读了很多次您不想调用的 .Result 。除非绝对需要,否则在任务上会出现结果。

So I would avoid this solution too. 因此,我也避免使用此解决方案。

solution *3 解决方案* 3

Honestly, I can't imagine what this will do exactly ^^. 老实说,我无法想象这将完全完成^^。 This may be a good solution, since you are not creating all the task at once. 这可能是一个很好的解决方案,因为您不会一次创建所有任务。 Anyway you are still going to create also 10 000 tasks so I would avoid it. 无论如何,您仍然要创建10,000个任务,因此我会避免这样做。

solution *4 解决方案* 4

Honestly, I don't even knew about this API, so I cannot really comment it. 老实说,我什至不知道这个API,所以我无法对其发表评论。 But since it involves a third party library, I would avoid it if possible. 但是由于它涉及第三方库,因此,如果可能的话,我将避免使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM