简体繁体 English

异步和Parallel.ForEach对IO操作的潜在好处

[英]Potential benefit of async and Parallel.ForEach for IO operations

原文 2011-11-14 20:15:30 3 3 c#/ multithreading/ asynchronous/ task-parallel-library

I am developing and maintaining a .NET 3.5 tool at work, and wondering whether a potential gain in performance can be gained by using .NET 4's new TPL or even the new async features which are still in CTP. 我正在开发和维护一个工作中的.NET 3.5工具，想知道是否可以通过使用.NET 4的新TPL或什至仍在CTP中的新异步功能来获得潜在的性能提升。

The tool's work can be roughly described as: 该工具的工作大致可以描述为：

Retrieve a list of container files (currently .MSI files) -- a few dozens of them, ~ 50-70 检索容器文件列表（当前为.MSI文件）-几十个，大约50-70
Iterate over each file, and construct a runtime object representing it. 遍历每个文件，并构造一个表示该文件的运行时对象。
For each runtime object created, perform some queries on its contents (compare its contents with some files on the system). 对于创建的每个运行时对象，对其内容执行一些查询（将其内容与系统上的某些文件进行比较）。

Items #2 and #3 are the lengthy ones, and i would like to get some opinions on the potential of improving the execution time (which is a few minutes right now) by using Parallel.ForEach or other methods for executing this work in parallel. 项目＃2和＃3是冗长的项目，我想对使用Parallel.ForEach或其他并行执行此工作的方法来改善执行时间（现在是几分钟）的潜力发表一些意见。。

Potential improvements i am foreseeing are: 我预见的潜在改进是：

Making use of multiple CPUs/cores Keeping the app running while IO operations (like reading files) are being done to do something else. 充分利用多个CPU /内核在执行IO操作（例如读取文件）以执行其他操作的同时保持应用程序运行。

Would you think this kind of application can benefit from these, before jumping into development? 在进入开发之前，您认为这种应用程序可以从中受益吗？

3 个解决方案

This definitely may get some improvements by using the TPL, which is available now in .NET 4. 使用.NET 4中现已提供的TPL，这肯定可以得到一些改进。

All three steps could potentially be designed to run in parallel. 这三个步骤都可能被设计为并行运行。

That being said, it's difficult, given the above, to know how much improvement you would see. 话虽如此，鉴于上述情况，很难知道您会看到多少改进。 The main issue is the heavy file I/O. 主要问题是繁重的文件I / O。 Even if you take advantage of multiple cores, the disk I/O will likely become a bottleneck, and trying to run this in parallel may actually slow down those portions of the code. 即使您利用多个内核，磁盘I / O也可能成为瓶颈，而尝试并行运行它可能会减慢代码的这些部分。

If you're doing a huge amount of IO in relation to the queries/computations, then you may not get a very large performance benefit just by running the routines in parallel. 如果您要进行大量与查询/计算有关的IO，那么仅通过并行运行例程可能不会获得很大的性能优势。

I would run a profiler to see where your application is spending time and then decide. 我将运行一个探查器，以查看您的应用程序在哪里花时间，然后决定。 If you find it is waiting for I/O completion then you may find benefit from using the Asynchronous Programming Model . 如果您发现它正在等待I / O完成，则可以从使用异步编程模型中受益。 If you find you are compute bound, then, depending on your anticipated runtime environment (multi-core/single core), you may find multi-threaded computation to be of benefit. 如果发现自己受到计算的限制，那么根据预期的运行时环境（多核/单核），您可能会发现多线程计算会有所帮助。 Of course, you may find that both cases apply. 当然，您可能会发现这两种情况都适用。

Incidentally, you can also use many of the .NET 4 threading features in .NET 3.5 by using Reactive Extensions . 顺便说一句，您还可以通过使用Reactive Extensions使用.NET 3.5中的许多.NET 4线程功能。 I am currently using this in a productive .NET 3.5 application. 我目前在生产性的.NET 3.5应用程序中使用它。

Would you think this kind of application can benefit from these, before jumping into development? 在进入开发之前，您认为这种应用程序可以从中受益吗？

Not very much. 不是很多。 You describe a 3-stage system in which every stage is heavily I/O bound. 您描述了一个三阶段系统，其中每个阶段都受I / O约束。

I assume you have only 1 Disk, that means running in parallel could even slow it down (more Seek operations). 我假设您只有1个磁盘，这意味着并行运行甚至可能使其速度降低（更多的Seek操作）。

On the other hand stage 2) and 3) could be CPU intensive enough to see some improvement. 另一方面，阶段2）和3）可能需要占用大量CPU才能看到一些改进。

You will have to measure, as usual. 您将必须像往常一样进行测量。