简体   繁体   English

超线程性能比较

[英]Hyper-threading Performance Comparison

I have written a project, which uses some basic functions in openssl such as RAND_bytes and des_ecb_encrypt . 我编写了一个项目,它使用openssl一些基本函数,如RAND_bytesdes_ecb_encrypt

My computer has i7-2600(4 cores and 8 logic CPU). 我的电脑有i7-2600(4核和8逻辑CPU)。 When I run my project with 4 threads, it will costs 10 seconds. 当我用4个线程运行我的项目时,它将花费10秒。 When I run it with 8 threads, it also costs 10 seconds. 当我使用8个线程运行它时,它也需要10秒。

What I mean is that hyper-threading doesn't give me any performance improvement. 我的意思是超线程并没有给我任何性能提升。 In Linux, the experiment result is same. 在Linux中,实验结果是一样的。

I found here tells me that hyper-threading doesn't give me some improvement in some situations. 我发现这里告诉我,在某些情况下,超线程并没有给我一些改进。 Also, I found here give me some intuitive results. 另外,我发现这里给我一些直观的结果。

However, I have tried to write some simple tests and found some simple examples which will show hyper-threading won't give me apparent improvement. 但是,我试图编写一些简单的测试,并发现一些简单的例子,显示超线程不会给我带来明显的改进。 Sadly, I don't find it. 可悲的是,我没有找到它。

So, my questions is that whether there are some simple tests shows the hyper-threading won't give me any performance improvement. 所以,我的问题是,是否有一些simple测试表明超线程不会给我任何性能提升。

You may find that hyperthreading helps more on code that is using large amounts of memory, so that the processor is regularly blocked on fetching from memory. 您可能会发现超线程有助于更多地使用大量内存的代码,因此处理器在从内存中获取时会被定期阻止。

In my experience, it's quite hard to find "simple code" that shows benefits from hyperthreading. 根据我的经验,很难找到显示超线程优势的“简单代码”。 It tends to be more complex examples that show the benefit. 它往往是更复杂的例子,显示了好处。 Still, the benefit will most likely not be 2x that of "no hyperthreading". 尽管如此,这种好处很可能不会是“没有超线程”的2倍。 Count on getting perhaps 20-30% improvement. 依靠获得20-30%的改善。

Hyper threading takes advantage of the fact that the CPU has many components and when one is used, when there's no hyper threading, the others just sit there idle. 超线程利用了CPU具有许多组件的事实,当使用一个组件时,当没有超线程时,其他组件只是闲置在那里。 You can try writing two types of threads, one doing integer calculations (that will hopefully use the ALU) and one doing floating point arithmetic (that will hopefully use the FPU). 您可以尝试编写两种类型的线程,一种执行整数计算(希望使用ALU),另一种执行浮点运算(希望使用FPU)。

I did not try this myself but it seems that in such a scenario hyper threading should improve the performance. 我自己没有尝试过,但似乎在这种情况下,超线程应该可以提高性能。

To show the opposite you can use only one type of the threads (either threads only doing integer operations or threads only doing floating point operations). 为了显示相反的情况,您只能使用一种类型的线程(线程只执行整数运算或线程只执行浮点运算)。

It may also be that your test is flawed, but in order to know if that is the case we'll need more information about that test. 也可能是您的测试存在缺陷,但为了了解情况,我们需要有关该测试的更多信息。

I have written a project, which use some basic functions in openssl such as RAND_bytes and des_ecb_encrypt... My computer has i7-2600(4 cores and 8 logic CPU). 我编写了一个项目,它使用openssl中的一些基本功能,如RAND_bytes和des_ecb_encrypt ...我的计算机有i7-2600(4核和8逻辑CPU)。 When I run my project with 4 threads, it will costs 10 seconds. 当我用4个线程运行我的项目时,它将花费10秒。 When I run it with 8 threads, it also costs 10 seconds. 当我使用8个线程运行它时,它也需要10秒。

When using RDRAND (which RAND_bytes will do in this case), the bus us the limiting factor. 当使用RDRAND (在这种情况下RAND_bytes会这样做)时,总线是限制因素。 You should peak at around 800MB/sec. 你的峰值应该在800MB /秒左右。 It does not matter how many threads you have - the bus cannot transfer data fast enough. 无论你有多少线程 - 总线都不能足够快地传输数据。 See Intel rdrand instruction revisited . 请参阅重新访问的英特尔rdrand指令

If you used AES, then you might see a better speedup over the DES/3DES observations. 如果您使用AES,那么您可能会看到比DES / 3DES观测更好的加速。 Your Ivy Bridge has AES-NI and it can achieve almost 1.3 cycle/byte, and that should be about double or triple AES is software. 你的Ivy Bridge有AES-NI ,它可以达到差不多1.3个周期/字节,那应该是两倍或三倍的AES软件。 To ensure you are using the AES-NI instructions, you have to use the EVP_* interfaces. 为确保使用AES-NI指令,必须使用EVP_*接口。


I found here tells me that hyper-threading doesn't give me some improvement in some situations. 我发现这里告诉我,在某些情况下,超线程并没有给我一些改进。 Also, I found here give me some intuitive results. 另外,我发现这里给我一些直观的结果。

I think @selalerer and @Mats Petersson answered your question. 我想@selalerer和@Mats Petersson回答了你的问题。 The problem does not scale linearly and there's a maximum speedup you will encounter. 问题不会线性扩展,并且您将遇到最大加速。 Intel states its about 30%. 英特尔约占30%。

Intel's newest architecture favors of Out-Of-Order execution over Hyper-threading execution because its supposed to be more efficient. 英特尔最新的架构优于超线程执行的乱序执行,因为它应该更高效。 Read about the Silvermont processor cores. 了解Silvermont处理器内核。

But if you want a formal deep dive, then see a book on computer engineering. 但如果你想要正式深入研究,那就看一本关于计算机工程的书。 Here's the book we used when I studied it in college: Computer Organization and Design (its probably a bit dated now). 这是我在大学学习时使用的那本书: 计算机组织与设计 (现在可能有点过时了)。


However, I have tried to write some simple tests and found some simple examples which will show hyper-threading won't give me apparent improvement. 但是,我试图编写一些简单的测试,并发现一些简单的例子,显示超线程不会给我带来明显的改进。

OpenSSL also has a benchmarking app. OpenSSL还有一个基准测试应用程序。 See the source code in <openssl source>/apps/speed.c . 请参阅<openssl source>/apps/speed.c的源代码。

Also, benchmarking apps have their own personalities. 此外,基准测试应用程序有自己的个性。 An encryption stress test may not reveal the differences as predominantly as you hope to see them. 加密压力测试可能无法显示差异,因为您希望看到差异。 See, for example, Benchmarking Tools . 例如,参见Benchmarking Tools

Following are details and results of my MP benchmarks for Linux and Windows, that can behave differently. 以下是我的Linux和Windows MP基准测试的详细信息和结果,其行为可能有所不同。 Not much HT but Linux tests include Atom (1 core 2 threads) and Windows has Core i7 results (4+4). 没有多少HT,但Linux测试包括Atom(1核2线程),Windows有Core i7结果(4 + 4)。

http://www.roylongbottom.org.uk/linux%20multithreading%20benchmarks.htm http://www.roylongbottom.org.uk/linux%20multithreading%20benchmarks.htm

http://www.roylongbottom.org.uk/quad%20core%208%20thread.htm http://www.roylongbottom.org.uk/quad%20core%208%20thread.htm

Take your pick, depending what you want to prove whether HT provides better or worse performance. 根据你想要证明HT是否提供更好或更差的性能,你可以选择。 Following are RandMem results on i7 (Linux seems better using this test). 以下是i7上的RandMem结果(Linux似乎更好地使用此测试)。 For such as i7, you also need to consider Turbo Boost that might be lower with multiple threads. 对于诸如i7,您还需要考虑使用多线程可能更低的Turbo Boost。

             CPUs          MBytes Per Second Using Threads        Gain At Threads
             /HTs         1       2       4       6       8     2     4     6     8
 Serial RD
 Core i7     4/8 L1   11458   22661   37039   43717   46374   2.0   3.2   3.8   4.0
 930             L2   10380   20832   32853   41711   42839   2.0   3.2   4.0   4.1
 #### MHz        L3    8828   17743   29610   38414   40330   2.0   3.4   4.4   4.6
 Win 764        RAM    4266    8712   17347   24946   25589   2.0   4.1   5.8   6.0

 Serial RW
 Core i7     4/8 L1   15282   13724   16240   16209   18379   0.9   1.1   1.1   1.2
 930             L2   12223   18216   25326   28104   27047   1.5   2.1   2.3   2.2
 #### MHz        L3   10234   19266   21931   24450   26351   1.9   2.1   2.4   2.6
 Win 764        RAM    4533    7656   13876   14543   13390   1.7   3.1   3.2   3.0

 Random RD
 Core i7     4/8 L1   11266   22548   38174   45592   47141   2.0   3.4   4.0   4.2
 930             L2    6233   12463   20059   24986   25667   2.0   3.2   4.0   4.1
 #### MHz        L3    3499    6915    9211   10002    9531   2.0   2.6   2.9   2.7
 Win 764        RAM     459     909    1241    1398    1364   2.0   2.7   3.0   3.0

 Random RW
 Core i7     4/8 L1   14375    3027    2780    2901    3297   0.2   0.2   0.2   0.2
 930             L2    5887    4555    6117    6693    7281   0.8   1.0   1.1   1.2
 #### MHz        L3    3104    4604    4721    5047    4933   1.5   1.5   1.6   1.6
 Win 764        RAM     428     860     899     948    1026   2.0   2.1   2.2   2.4

 #### 2.8 GHz running at up to 3.06 GHz via Turbo Boost, dual channel 1066 MHz DDR3 RAM 

Then the MP Whetstone benchmark that shows real gains 然后MP Whetstone基准显示真正的收益

                      MWIPS  MFLOP  MFLOP  MFLOP   COS    EXP   FIXPT   IF    EQUAL
CPU              MHz            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS

Core i7 1 Thrd  ####   3115   1065    886    738   79.3   39.7   2447   2936   1154

Core i7 Win7    ####  21690   8676   7621   5844    531    291  16643  12027   5034
Quad Core Thread 1            1091   1027    728   66.4   36.5   2050   1501    629
Plus HT   Thread 2            1089   1037    742   66.0   36.5   2090   1507    630
          Thread 3            1090    946    742   66.8   36.5   2069   1534    631
          Thread 4            1092   1037    727   66.6   36.6   2031   1501    630
          Thread 5            1042    959    736   66.4   36.5   1912   1483    630
          Thread 6            1091    874    723   66.6   36.1   2049   1507    629
          Thread 7            1090    867    725   65.6   36.3   2094   1516    631
          Thread 8            1091    874    722   66.3   36.3   2350   1476    624

Gain %                  696    815    860    792    670    733    680    410    436

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM