简体   繁体   English

使用 -O3 编译 gcc 在技术上会增加我的缓存未命中率

[英]compiling gcc with -O3 technically increases my cache miss rate

I've been profiling a bit with cachegrind and noticed something.我一直在使用 cachegrind 进行分析并注意到一些事情。 When i compile with -O3 i had fewer data fetches but the same amount of cache misses resulting in a higher miss rate.当我用 -O3 编译时,我有较少的数据获取,但相同数量的缓存未命中导致更高的未命中率。 This is great but it just seems like a funny weird thing to me and i'd like to know what's going on behind the scene.这很棒,但对我来说似乎是一件有趣的奇怪的事情,我想知道幕后发生了什么。 The only other relevant compiler option i have turned on is -march=native .我打开的唯一其他相关编译器选项是 -march=native 。 For comparison,为了比较,

Without O3不含 O3

==16951== D   refs:        923,170,681  (817,941,424 rd   + 105,229,257 wr)
==16951== D1  misses:        9,477,102  (  8,115,150 rd   +   1,361,952 wr)
==16951== LLd misses:          647,219  (    262,227 rd   +     384,992 wr)
==16951== D1  miss rate:           1.0% (        1.0%     +         1.3%  )
==16951== LLd miss rate:           0.1% (        0.0%     +         0.4%  )

With O3含 O3

==16978== D   refs:      218,804,125  (205,979,405 rd   + 12,824,720 wr)
==16978== D1  misses:      9,372,533  (  8,016,083 rd   +  1,356,450 wr)
==16978== LLd misses:        647,195  (    262,191 rd   +    385,004 wr)
==16978== D1  miss rate:         4.3% (        3.9%     +       10.6%  )
==16978== LLd miss rate:         0.3% (        0.1%     +        3.0%  )

It's most likely due to vectorization:这很可能是由于矢量化:

-O3

Optimize yet more. -O3 turns on all optimizations specified by -O2
and also turns on ... -ftree-vectorize and -fipa-cp-clone options. 

(from GCC manpage). (来自 GCC 联机帮助页)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM