简体   繁体   English

使用TPL并行阻塞IO操作

[英]Using TPL with parallel blocking IO operations

Preface: I'm aware that using the ThreadPool (either via TPL or directly) for IO operations is generally frowned upon because IO is necessarily sequential, however my problem relates to "parallel IO" with blocking calls that don't expose an Async method. 前言:我知道使用ThreadPool(通过TPL或直接)进行IO操作通常不赞成,因为IO必然是顺序的,但我的问题涉及“并行IO”,阻塞调用不会暴露Async方法。

I'm writing a GUI tool that fetches information about computers on the network that does this (simplified code): 我正在编写一个GUI工具来获取有关网络中执行此操作的计算机的信息(简化代码):

String[] computerNames = { "foo", "bar", "baz" };
foreach(String computerName in computerNames) {

    Task.Factory
        .StartNew( GetComputerInfo, computerName )
        .ContinueWith( ShowOutputInGui, RunOnGuiThread );

}

private ComputerInfo GetComputerInfo(String machineName) {

    Task<Int64>     pingTime  = Task.Factory.StartNew( () => GetPingTime( machineName ) );
    Task<Process[]> processes = Task.Factory.StartNew( () => System.Diagnostics.Process.GetProcesses( machineName ) );
    // and loads more

    Task.WaitAll( pingtime, processes, etc );

    return new ComputerInfo( pingTime.Result, processes.Result, etc );
}

When I run this code I'm finding it takes a surprisingly long amount of time to run compared to the old sequential code I had. 当我运行这段代码时,我发现与旧的顺序代码相比,它需要花费相当长的时间才能运行。

Note that each task in the GetComputerInfo method is entirely independent of others around it (eg Ping time can be computed separately from GetProcesses ), yet when I inserted some Stopwatch timing calls, I discovered that the individual sub-tasks, like the GetProcesses call were only being started up to 3000ms after GetComputerInfo had been called - there exists some large delay going on. 请注意, GetComputerInfo方法中的每个任务完全独立于其周围的其他任务(例如,Ping时间可以与GetProcesses分开计算),但是当我插入一些Stopwatch时序调用时,我发现各个子任务,如GetProcesses调用,只有在GetComputerInfo后才能启动3000ms - 存在一些大的延迟。

I noticed that when I reduce the number of outer parallel calls into GetComputerInfo (by reducing the size of the computerNames array) the first results were returned almost immediately. 我注意到当我将外部并行调用的数量减少到GetComputerInfo (通过减小computerNames数组的大小)时,几乎立即返回了第一个结果。 Some of the computer-names are for computers that are turned-off, so called to GetProcesses and PingTime take a very long time before timing out (my real code catches the exceptions). 有些计算机名用于关闭的计算机,因此调用GetProcessesPingTime需要很长时间才能超时(我的实际代码会捕获异常)。 This is probably because the offline computers are blocking Tasks being run and the TPL naturally restricts it to my CPU hardware thread count (8). 这可能是因为脱机计算机正在阻止正在运行的Tasks ,TPL自然会将其限制为我的CPU硬件线程数(8)。

Is there a way to tell TPL to not let the inner tasks (eg GetProcesses ) block outer tasks ( GetComputerInfo )? 有没有办法告诉TPL不让内部任务(例如GetProcesses )阻止外部任务( GetComputerInfo )?

(I've tried the "Parent/Child" task attachment/blocking, but it doesn't apply to my situation as I never explicitly attach child tasks to parent tasks, and the parent task naturally waits with Task.WaitAll anyway). (我已尝试过“父/子”任务附件/阻止,但它不适用于我的情况,因为我从未明确地将子任务附加到父任务,并且父任务自然会等待Task.WaitAll )。

I assume that you have your foreach loop in some event handler, so first thing you should do is to mark it as async so you can call your other in async way. 我假设你在一些事件处理程序中有你的foreach循环,所以你应该做的第一件事是将它标记为async这样你就可以以异步方式调用你的另一个。 After that, you should introduce your GetComputerInfo to do async all the way down . 在这之后,你应该介绍一下你GetComputerInfoasync 一路下跌

There are additional pitfalls in your code: StartNew is dangerous , as it uses Current scheduler for tasks, rather than Default (so you need other overload). 您的代码中还有其他陷阱: StartNew是危险的 ,因为它使用Current调度程序执行任务,而不是Default (因此您需要其他重载)。 Unfortunately, that overload needs some more parameters, so the code will be not so simple. 不幸的是,那个重载需要更多的参数,所以代码不会那么简单。 The good news is that you still need that overload to tell the thread pool that your tasks are long running so it should use a dedicated thread for them: 好消息是你仍然需要重载来告诉线程池你的任务是长期运行的,所以它应该为它们使用专用线程:

TaskCreationOptions.LongRunning

Specifies that a task will be a long-running, coarse-grained operation involving fewer, larger components than fine-grained systems. 指定任务将是一个长时间运行的粗粒度操作,涉及比细粒度系统更少,更大的组件。 It provides a hint to the TaskScheduler that oversubscription may be warranted. 它向TaskScheduler提供了一个提示,即可以保证超额TaskScheduler

Oversubscription lets you create more threads than the available number of hardware threads. Oversubscription允许您创建比可用硬件线程数更多的线程。 It also provides a hint to the task scheduler that an additional thread might be required for the task so that it does not block the forward progress of other threads or work items on the local thread-pool queue. 它还向任务调度程序提供了一个提示,即任务可能需要一个额外的线程,这样它就不会阻止本地线程池队列中其他线程或工作项的前进。

Also you should avoid the WaitAll method as it is a blocking operation, so you have 1 thread less to do the actual work. 你也应该避免WaitAll方法,因为它是一个阻塞操作,所以你有1线程来做实际的工作。 You probably want to use WhenAll . 您可能想要使用WhenAll

And finally, for returning your ComputerInfo result you can use the continuation with TaskCompletionSource usage, so your code could be something like this (cancellation logic also added): 最后,为了返回您的ComputerInfo结果,您可以使用TaskCompletionSource用法的继续,因此您的代码可能是这样的(取消逻辑也添加):

using System.Diagnostics;

// handle event in fire-and-forget manner
async void btn_Click(object sender, EventArgs e)
{
    var computerNames = { "foo", "bar", "baz" };
    foreach(String computerName in computerNames)
    {
        var compCancelSource = new CancellationTokenSource();

        // asynchronically wait for next computer info
        var compInfo = await GetComputerInfo(computerName, compCancelSource. Token);
        // We are in UI context here
        ShowOutputInGui(compInfo);
        RunOnGuiThread(compInfo);
    }
}

private Task<ComputerInfo> GetComputerInfo(String machineName, CancellationToken token)
{
    var pingTime = Task.Factory.StartNew(
        // action to run
        () => GetPingTime(machineName),
        //token to cancel
        token,
        // notify the thread pool that this task could take a long time to run,
        // so the new thread probably will be used for it
        TaskCreationOptions.LongRunning,
        // execute all the job in a thread pool
        TaskScheduler.Default);

    var processes = Task.Run(() => Process.GetProcesses(machineName), token, TaskCreationOptions.LongRunning, TaskScheduler.Default);
    // and loads more

    await Task.WhenAll(pingtime, processes, etc);
    return new ComputerInfo(pingTime.Result, processes.Result, etc);

    //var tcs = new TaskCompletionSource<ComputerInfo>();
    //Task.WhenAll(pingtime, processes, etc)
    //    .ContinueWith(aggregateTask =>
    //        if (aggregateTask.IsCompleted)
    //        {
    //            tcs.SetResult(new ComputerInfo(
    //                aggregateTask.Result[0],
    //                aggregateTask.Result[1],
    //                etc));
    //        }
    //        else
    //        {
    //            // cancel or error handling
    //        });

    // return the awaitable task
    //return tcs.Task;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM