简体繁体 English

为什么超线程对我的算法有好处？

[英]Why does hyper-threading benefit my algorithm?

原文 2013-10-17 06:55:29 5 3 c#/ multithreading/ hyperthreading

I have a dual core machine with 4 logical processors thanks to hyper-threading. 由于超线程，我有一台具有4个逻辑处理器的双核计算机。 I am executing a SHA1 pre-image brute force test in C#. 我正在C＃中执行SHA1图像前蛮力测试。 In each thread I basically have a for loop and compute a SHA1 hash and then compare the hash to what I am looking for. 在每个线程中，我基本上都有一个for循环并计算SHA1哈希，然后将该哈希与所需的内容进行比较。 I made sure that all threads execute in complete separation. 我确保所有线程都完全分开执行。 No memory is shared between them. 它们之间没有共享内存。 (Except one variable: long count, which I increment in each thread using: （除了一个变量：long count，我使用以下命令在每个线程中递增：

System.Threading.Interlocked.Increment(ref count);

I get about 1 mln sha1/s with 2 threads and 1.3 mln sha1/s with 4 threads. 我得到2线程的约100万sha1 / s和4线程的130万sha1 / s。 I fail to see why do I get a 30% bonus from HT in this case. 在这种情况下，我看不出为什么我会从HT获得30％的奖金。 Both cores should be busy doing their stuff, so increasing the number of threads beyond 2 should not give me any benefit. 两个内核都应该忙于完成自己的工作，因此将线程数增加到2以上不会给我带来任何好处。 Can anyone explain why? 谁能解释为什么？

3 个解决方案

Hyperthreading effectively gives you more cores, for integer operations - it allows two sets of integer operations to run in parallel on a single physical core. 超线程有效地为整数运算提供了更多的内核-它允许两组整数运算在单个物理内核上并行运行。 It doesn't help floating point operations as far as I'm aware, but presumably the SHA-1 code is primarily integer operations, hence the speed-up. 据我所知，它对浮点运算没有帮助，但是大概SHA-1代码主要是整数运算，因此可以提高速度。

It's not as good as having 4 real physical cores, of course - but it does allow for a bit more parallelism. 当然，它不如拥有4个实际的物理核心-但它确实允许更多的并行性。

Disable HT in BIOS and do the test again for 2 threads. 在BIOS中禁用HT，然后对2个线程再次进行测试。 HT gives a little speedup only when one virtual core uses CPU instruction set and second executes instructions which uses FPU registers. 仅当一个虚拟内核使用CPU指令集并且第二个虚拟内核执行使用FPU寄存器的指令时，HT才会稍微提高速度。

SMT/Hyperthreading allows multiple threads (usually two), on the same physical core, to execute -- one is typically waiting for the other to encounter a stall, and then the thread which is executing will switch. SMT /超线程允许在同一物理核心上执行多个线程（通常为两个），一个线程通常等待另一个线程遇到停顿，然后执行的线程将切换。

Stalls happen -- mostly with cache misses. 发生失速-主要是由于缓存未命中。 Even if you are not traversing the same memory, there's no guarantee that said memory will already be in the cache (thus inducing a stall when it is accessed) , or that it will not map to the same line of the cache that another thread is mapping memory to. 即使您没有遍历相同的内存，也无法保证该内存已经在缓存中（因此在访问时会导致停顿） ，或者无法映射到另一个线程所在的缓存的同一行。将内存映射到。

Thus, two threads will almost always benefit from SMT/hyperthreading , unless the data they traverse is already present in the cache. 因此，两个线程几乎总是会受益于SMT /超线程 ，除非它们遍历的数据已经存在于缓存中。 That's actually an unusual scenario -- an algorithm typically needs to prefetch its data, and additionally not use more than the cache can hold, or not overwrite memory other threads are trying to cache -- which requires knowledge of other threads on the core. 这实际上是一种不寻常的情况-算法通常需要预取其数据，并且使用的缓存不超过缓存可以容纳的数量，或者不覆盖其他线程试图缓存的内存-这需要了解内核上的其他线程。 That's not usually possible, because it's abstracted away by the OS. 通常这是不可能的，因为它是由操作系统抽象出来的。

Most algorithms are not tuned to that extent, particularly since its only usually console-exclusive games, or other hardware exclusive applications, which can guarantee a certain minimum spec for the cache, and more importantly, have intimate knowledge of other threads which are running concurrently on the same core. 大多数算法都没有调整到这种程度，特别是因为它仅通常是控制台专用的游戏或其他硬件专用的应用程序，可以保证一定的最低缓存规格，更重要的是，它对并发运行的其他线程有深入的了解。在同一核心上。 This is also one of the major reasons larger caches benefit modern CPU performance. 这也是较大的缓存受益于现代CPU性能的主要原因之一。