简体   繁体   English

如何为hyptherthreading / multicore选择最佳线程数?

[英]How do I pick the best number of threads for hyptherthreading/multicore?

I have some embarrassingly-parallelizable work in a .NET 3.5 console app and I want to take advantage of hyperthreading and multi-core processors. 我在.NET 3.5控制台应用程序中有一些令人尴尬的可并行化工作,我想利用超线程和多核处理器。 How do I pick the best number of worker threads to utilize either of these the best on an arbitrary system? 如何在任意系统上选择最佳数量的工作线程来充分利用其中任何一种? For example, if it's a dual core I will want 2 threads; 例如,如果它是双核,我将需要2个线程; quad core I will want 4 threads. 四核我想要4个线程。 What I'm ultimately after is determining the processor characteristics so I can know how many threads to create. 我最终得到的是确定处理器特性,以便我知道要创建多少线程。

I'm not asking how to split up the work nor how to do threading, I'm asking how do I determine the "optimal" number of the threads on an arbitrary machine this console app will run on. 我不是问如何拆分工作也不是如何进行线程,我问我如何确定这个控制台应用程序运行的任意机器上的“最佳”线程数。

I'd suggest that you don't try to determine it yourself. 我建议你不要自己决定。 Use the ThreadPool and let .NET manage the threads for you. 使用ThreadPool让.NET为您管理线程。

You can use Environment.ProcessorCount if that's the only thing you're after. 你可以使用Environment.ProcessorCount,如果这是你唯一的事情。 But usually using a ThreadPool is indeed the better option. 但通常使用ThreadPool确实是更好的选择。

The .NET thread pool also has provisions for sometimes allocating more threads than you have cores to maximise throughput in certain scenarios where many threads are waiting for I/O to finish. .NET线程池还有一些条件,有时会分配比核心更多的线程,以便在许多线程等待I / O完成的某些情况下最大化吞吐量。

The correct number is obviously 42 . 正确的数字显然是42

Now on the serious note. 现在就认真了。 Just use the thread pool, always. 只需使用线程池。

1) If you have a lengthy processing task (ie. CPU intensive) that can be partitioned into multiple work piece meals then you should partition your task and then submit all individual work items to the ThreadPool . 1)如果您有一个冗长的处理任务(即CPU密集型),可以将其划分为多个工件,那么您应该对任务进行分区,然后将所有单个工作项提交给ThreadPool The thread pool will pick up work items and start churning on them in a dynamic fashion as it has self monitoring capabilities that include starting new threads as needed and can be configured at deployment by administrators according to the deployment site requirements , as opposed to pre-compute the numbers at development time. 线程池将以动态方式拾取工作项并开始搅拌,因为它具有自我监视功能,包括根据需要启动新线程,并且可以由管理员根据部署站点要求在部署时进行配置 ,而不是预先计算开发时的数字。 While is true that the proper partitioning size of your processing task can take into account the number of CPUs available, the right answer depends so much on the nature of the task and the data that is not even worth talking about at this stage (and besides the primary concerns should be your NUMA nodes , memory locality and interlocked cache contention, and only after that the number of cores). 虽然您的处理任务的正确分区大小可以考虑可用的CPU数量,但正确的答案在很大程度上取决于任务的性质以及在此阶段甚至不值得谈论的数据(此外主要问题应该是您的NUMA节点 ,内存位置和互锁缓存争用,并且只有在核心数量之后。

2) If you're doing I/O (including DB calls) then you should use Asynchronous I/O and complete the calls in ThreadPool called completion routines. 2)如果您正在进行I / O(包括数据库调用),那么您应该使用异步I / O并完成ThreadPool中称为完成例程的调用。

These two are the the only valid reasons why you should have multiple threads, and they're both best handled by using the ThreadPool. 这两个是你应该拥有多个线程的唯一有效理由,并且使用ThreadPool可以最好地处理它们。 Anything else, including starting a thread per 'request' or 'connection' are in fact anti patterns on the Win32 API world (fork is a valid pattern in *nix, but definitely not on Windows). 其他任何事情,包括根据'请求'或'连接'启动一个线程实际上是Win32 API世界中的反模式(fork是* nix中的有效模式,但绝对不在Windows上)。

For a more specialized and way, way more detailed discussion of the topic I can only recommend the Rick Vicik papers on the subject: 对于更专业和更方式的方式,对该主题进行更详细的讨论,我只能推荐有关该主题的Rick Vicik论文:

A good rule of the thumb, given that you're completely CPU-bound, is processorCount+1 . 鉴于你完全受CPU约束,拇指的一个好规则是processorCount+1

That's +1 because you will always get some tasks started/stopped/interrupted and n tasks will almost never completely fill up n processors. 这是+1,因为你总会得到一些任务开始/停止/中断, n任务几乎永远不会完全填满n处理器。

The optimal number would just be the processor count. 最佳数字只是处理器数量。 Optimally you would always have one thread running on a CPU (logical or physical) to minimise context switches and the overhead that has with it. 最理想的是,您总是会在CPU(逻辑或物理)上运行一个线程,以最大限度地减少上下文切换以及与之相关的开销。

Whether that is the right number depends (very much as everyone has said) on what you are doing. 这是否是正确的数字取决于(正如每个人所说的)你正在做的事情。 The threadpool (if I understand it correctly) pretty much tries to use as few threads as possible but spins up another one each time a thread blocks. 线程池(如果我理解正确的话)几乎尝试使用尽可能少的线程,但每次线程阻塞时都会旋转另一个线程。

The blocking is never optimal but if you are doing any form of blocking then the answer would change dramatically. 阻止永远不是最优的,但如果你正在进行任何形式的阻塞,那么答案将会发生巨大变化。

The simplest and easiest way to get good (not necessarily optimal) behaviour is to use the threadpool. 获得良好(不一定是最佳)行为的最简单和最简单的方法是使用线程池。 In my opinion its really hard to do any better than the threadpool so thats simply the best place to start and only ever think about something else if you can demonstrate why that is not good enough. 在我看来,它真的很难比线程池做得更好,所以这只是最好的起点,如果你能证明为什么不够好就只考虑别的东西。

The only way is a combination of data and code analysis based on performance data. 唯一的方法是基于性能数据的数据和代码分析的组合。

Different CPU families and speeds vs. memory speed vs other activities on the system are all going to make the tuning different. 不同的CPU系列和速度与内存速度相比,系统上的其他活动都会使调整不同。

Potentially some self-tuning is possible, but this will mean having some form of live performance tuning and self adjustment. 可能有一些自我调整是可能的,但这意味着要进行某种形式的现场表演调整和自我调整。

Or even better than the ThreadPool, use .NET 4.0 Task instances from the TPL. 甚至比ThreadPool更好,使用TPL中的.NET 4.0 Task实例。 The Task Parallel Library is built on a foundation in the .NET 4.0 framework that will actually determine the optimal number of threads to perform the tasks as efficiently as possible for you. 任务并行库是在.NET 4.0框架的基础上构建的,它实际上将确定为您尽可能高效地执行任务的最佳线程数。

I read something on this recently (see the accepted answer to this question for example). 我最近读到了一些内容(例如,请参阅此问题的已接受答案)。

The simple answer is that you let the operating system decide. 简单的答案是你让操作系统决定。 It can do a far better job of deciding what's optimal than you can. 它可以更好地决定什么是最优的。

There are a number of questions on a similar theme - search for "optimal number threads" (without the quotes) gives you a couple of pages of results. 关于类似主题有很多问题 - 搜索“最佳数字线程”(没有引号)会给你几页结果。

I would say it also depends on what you are doing, if your making a server application then using all you can out of the CPU`s via either Environment.ProcessorCount or a thread pool is a good idea. 我会说这也取决于你在做什么,如果你制作一个服务器应用程序,那么通过Environment.ProcessorCount或线程池使用你所能从CPU中获得的所有东西都是个好主意。 But if this is running on a desktop or a machine that not dedicated to this task, you might want to leave some CPU idle so the machine "functions" for the user. 但是,如果这是在桌面或不专用于此任务的计算机上运行,​​您可能希望保留一些CPU空闲,以便机器为用户“运行”。

可以说,选择最佳线程数的真正方法是让应用程序对自身进行分析,并根据提供最佳性能的内容自适应地改变其线程行为。

I wrote a simple number crunching app that used multiple threads, and found that on my Quad-core system, it completed the most work in a fixed period using 6 threads. 我写了一个使用多个线程的简单数字运算应用程序,并发现在我的四核系统上,它使用6个线程在固定时间内完成了大部分工作。

I think the only real way to determine is through trialling or profiling. 我认为唯一真正的方法是通过试验或剖析。

除了处理器计数之外,您可能还需要通过计算GetProcessAffinityMask函数返回的关联掩码中的位来考虑进程的处理器关联。

如果线程运行时没有过多的i / o处理或系统调用,则线程数(主线程除外)通常等于系统中的处理器/内核数,否则可以尝试增加通过测试的线程数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM