[英]Hyper-threading Performance Comparison
I have written a project, which uses some basic functions in openssl
such as RAND_bytes
and des_ecb_encrypt
. 我编写了一个项目,它使用
openssl
一些基本函数,如RAND_bytes
和des_ecb_encrypt
。
My computer has i7-2600(4 cores and 8 logic CPU). 我的电脑有i7-2600(4核和8逻辑CPU)。 When I run my project with 4 threads, it will costs 10 seconds.
当我用4个线程运行我的项目时,它将花费10秒。 When I run it with 8 threads, it also costs 10 seconds.
当我使用8个线程运行它时,它也需要10秒。
What I mean is that hyper-threading doesn't give me any performance improvement. 我的意思是超线程并没有给我任何性能提升。 In Linux, the experiment result is same.
在Linux中,实验结果是一样的。
I found here tells me that hyper-threading doesn't give me some improvement in some situations. 我发现这里告诉我,在某些情况下,超线程并没有给我一些改进。 Also, I found here give me some intuitive results.
另外,我发现这里给我一些直观的结果。
However, I have tried to write some simple tests and found some simple examples which will show hyper-threading won't give me apparent improvement. 但是,我试图编写一些简单的测试,并发现一些简单的例子,显示超线程不会给我带来明显的改进。 Sadly, I don't find it.
可悲的是,我没有找到它。
So, my questions is that whether there are some simple
tests shows the hyper-threading won't give me any performance improvement. 所以,我的问题是,是否有一些
simple
测试表明超线程不会给我任何性能提升。
You may find that hyperthreading helps more on code that is using large amounts of memory, so that the processor is regularly blocked on fetching from memory. 您可能会发现超线程有助于更多地使用大量内存的代码,因此处理器在从内存中获取时会被定期阻止。
In my experience, it's quite hard to find "simple code" that shows benefits from hyperthreading. 根据我的经验,很难找到显示超线程优势的“简单代码”。 It tends to be more complex examples that show the benefit.
它往往是更复杂的例子,显示了好处。 Still, the benefit will most likely not be 2x that of "no hyperthreading".
尽管如此,这种好处很可能不会是“没有超线程”的2倍。 Count on getting perhaps 20-30% improvement.
依靠获得20-30%的改善。
Hyper threading takes advantage of the fact that the CPU has many components and when one is used, when there's no hyper threading, the others just sit there idle. 超线程利用了CPU具有许多组件的事实,当使用一个组件时,当没有超线程时,其他组件只是闲置在那里。 You can try writing two types of threads, one doing integer calculations (that will hopefully use the ALU) and one doing floating point arithmetic (that will hopefully use the FPU).
您可以尝试编写两种类型的线程,一种执行整数计算(希望使用ALU),另一种执行浮点运算(希望使用FPU)。
I did not try this myself but it seems that in such a scenario hyper threading should improve the performance. 我自己没有尝试过,但似乎在这种情况下,超线程应该可以提高性能。
To show the opposite you can use only one type of the threads (either threads only doing integer operations or threads only doing floating point operations). 为了显示相反的情况,您只能使用一种类型的线程(线程只执行整数运算或线程只执行浮点运算)。
It may also be that your test is flawed, but in order to know if that is the case we'll need more information about that test. 也可能是您的测试存在缺陷,但为了了解情况,我们需要有关该测试的更多信息。
I have written a project, which use some basic functions in openssl such as RAND_bytes and des_ecb_encrypt... My computer has i7-2600(4 cores and 8 logic CPU).
我编写了一个项目,它使用openssl中的一些基本功能,如RAND_bytes和des_ecb_encrypt ...我的计算机有i7-2600(4核和8逻辑CPU)。 When I run my project with 4 threads, it will costs 10 seconds.
当我用4个线程运行我的项目时,它将花费10秒。 When I run it with 8 threads, it also costs 10 seconds.
当我使用8个线程运行它时,它也需要10秒。
When using RDRAND
(which RAND_bytes
will do in this case), the bus us the limiting factor. 当使用
RDRAND
(在这种情况下RAND_bytes
会这样做)时,总线是限制因素。 You should peak at around 800MB/sec. 你的峰值应该在800MB /秒左右。 It does not matter how many threads you have - the bus cannot transfer data fast enough.
无论你有多少线程 - 总线都不能足够快地传输数据。 See Intel rdrand instruction revisited .
请参阅重新访问的英特尔rdrand指令 。
If you used AES, then you might see a better speedup over the DES/3DES observations. 如果您使用AES,那么您可能会看到比DES / 3DES观测更好的加速。 Your Ivy Bridge has
AES-NI
and it can achieve almost 1.3 cycle/byte, and that should be about double or triple AES is software. 你的Ivy Bridge有
AES-NI
,它可以达到差不多1.3个周期/字节,那应该是两倍或三倍的AES软件。 To ensure you are using the AES-NI
instructions, you have to use the EVP_*
interfaces. 为确保使用
AES-NI
指令,必须使用EVP_*
接口。
I found here tells me that hyper-threading doesn't give me some improvement in some situations.
我发现这里告诉我,在某些情况下,超线程并没有给我一些改进。 Also, I found here give me some intuitive results.
另外,我发现这里给我一些直观的结果。
I think @selalerer and @Mats Petersson answered your question. 我想@selalerer和@Mats Petersson回答了你的问题。 The problem does not scale linearly and there's a maximum speedup you will encounter.
问题不会线性扩展,并且您将遇到最大加速。 Intel states its about 30%.
英特尔约占30%。
Intel's newest architecture favors of Out-Of-Order execution over Hyper-threading execution because its supposed to be more efficient. 英特尔最新的架构优于超线程执行的乱序执行,因为它应该更高效。 Read about the Silvermont processor cores.
了解Silvermont处理器内核。
But if you want a formal deep dive, then see a book on computer engineering. 但如果你想要正式深入研究,那就看一本关于计算机工程的书。 Here's the book we used when I studied it in college: Computer Organization and Design (its probably a bit dated now).
这是我在大学学习时使用的那本书: 计算机组织与设计 (现在可能有点过时了)。
However, I have tried to write some simple tests and found some simple examples which will show hyper-threading won't give me apparent improvement.
但是,我试图编写一些简单的测试,并发现一些简单的例子,显示超线程不会给我带来明显的改进。
OpenSSL also has a benchmarking app. OpenSSL还有一个基准测试应用程序。 See the source code in
<openssl source>/apps/speed.c
. 请参阅
<openssl source>/apps/speed.c
的源代码。
Also, benchmarking apps have their own personalities. 此外,基准测试应用程序有自己的个性。 An encryption stress test may not reveal the differences as predominantly as you hope to see them.
加密压力测试可能无法显示差异,因为您希望看到差异。 See, for example, Benchmarking Tools .
例如,参见Benchmarking Tools 。
Following are details and results of my MP benchmarks for Linux and Windows, that can behave differently. 以下是我的Linux和Windows MP基准测试的详细信息和结果,其行为可能有所不同。 Not much HT but Linux tests include Atom (1 core 2 threads) and Windows has Core i7 results (4+4).
没有多少HT,但Linux测试包括Atom(1核2线程),Windows有Core i7结果(4 + 4)。
http://www.roylongbottom.org.uk/linux%20multithreading%20benchmarks.htm http://www.roylongbottom.org.uk/linux%20multithreading%20benchmarks.htm
http://www.roylongbottom.org.uk/quad%20core%208%20thread.htm http://www.roylongbottom.org.uk/quad%20core%208%20thread.htm
Take your pick, depending what you want to prove whether HT provides better or worse performance. 根据你想要证明HT是否提供更好或更差的性能,你可以选择。 Following are RandMem results on i7 (Linux seems better using this test).
以下是i7上的RandMem结果(Linux似乎更好地使用此测试)。 For such as i7, you also need to consider Turbo Boost that might be lower with multiple threads.
对于诸如i7,您还需要考虑使用多线程可能更低的Turbo Boost。
CPUs MBytes Per Second Using Threads Gain At Threads
/HTs 1 2 4 6 8 2 4 6 8
Serial RD
Core i7 4/8 L1 11458 22661 37039 43717 46374 2.0 3.2 3.8 4.0
930 L2 10380 20832 32853 41711 42839 2.0 3.2 4.0 4.1
#### MHz L3 8828 17743 29610 38414 40330 2.0 3.4 4.4 4.6
Win 764 RAM 4266 8712 17347 24946 25589 2.0 4.1 5.8 6.0
Serial RW
Core i7 4/8 L1 15282 13724 16240 16209 18379 0.9 1.1 1.1 1.2
930 L2 12223 18216 25326 28104 27047 1.5 2.1 2.3 2.2
#### MHz L3 10234 19266 21931 24450 26351 1.9 2.1 2.4 2.6
Win 764 RAM 4533 7656 13876 14543 13390 1.7 3.1 3.2 3.0
Random RD
Core i7 4/8 L1 11266 22548 38174 45592 47141 2.0 3.4 4.0 4.2
930 L2 6233 12463 20059 24986 25667 2.0 3.2 4.0 4.1
#### MHz L3 3499 6915 9211 10002 9531 2.0 2.6 2.9 2.7
Win 764 RAM 459 909 1241 1398 1364 2.0 2.7 3.0 3.0
Random RW
Core i7 4/8 L1 14375 3027 2780 2901 3297 0.2 0.2 0.2 0.2
930 L2 5887 4555 6117 6693 7281 0.8 1.0 1.1 1.2
#### MHz L3 3104 4604 4721 5047 4933 1.5 1.5 1.6 1.6
Win 764 RAM 428 860 899 948 1026 2.0 2.1 2.2 2.4
#### 2.8 GHz running at up to 3.06 GHz via Turbo Boost, dual channel 1066 MHz DDR3 RAM
Then the MP Whetstone benchmark that shows real gains 然后MP Whetstone基准显示真正的收益
MWIPS MFLOP MFLOP MFLOP COS EXP FIXPT IF EQUAL
CPU MHz 1 2 3 MOPS MOPS MOPS MOPS MOPS
Core i7 1 Thrd #### 3115 1065 886 738 79.3 39.7 2447 2936 1154
Core i7 Win7 #### 21690 8676 7621 5844 531 291 16643 12027 5034
Quad Core Thread 1 1091 1027 728 66.4 36.5 2050 1501 629
Plus HT Thread 2 1089 1037 742 66.0 36.5 2090 1507 630
Thread 3 1090 946 742 66.8 36.5 2069 1534 631
Thread 4 1092 1037 727 66.6 36.6 2031 1501 630
Thread 5 1042 959 736 66.4 36.5 1912 1483 630
Thread 6 1091 874 723 66.6 36.1 2049 1507 629
Thread 7 1090 867 725 65.6 36.3 2094 1516 631
Thread 8 1091 874 722 66.3 36.3 2350 1476 624
Gain % 696 815 860 792 670 733 680 410 436
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.